BritishGeologicalSurvey / etlhelper

ETL Helper is a Python ETL library to simplify data transfer into and out of databases.
https://britishgeologicalsurvey.github.io/etlhelper/
GNU Lesser General Public License v3.0
104 stars 25 forks source link

Clarify documentation re table structure differences between source and destination #145

Closed mgierdal closed 1 year ago

mgierdal commented 1 year ago

In my experience, copy_table_rows() requires the presence of a table in the destination DB. Inclusion of its creation in copy_table_rows() or adding a separate function that creates it in destination (or composes CREATE statement) based on the source table would be helpful.

volcan01010 commented 1 year ago

Hi @mgierdal,

Yes. Although you can't exactly call it a "feature", it is part of how etlhelper is designed. The aim is to keep the code simple, but creating a table based on the given data is really hard. You need to know about data types for each column and the specific commands for the target database. If your target database doesn't have the table already, there are two options:

I hope that helps,

John

mgierdal commented 1 year ago

Fair enough. Could you elaborate on how copy_table_rows deals with disagreement between data from src and the dest table - what fails and what is acceptable? Perhaps there could be an entry in the tutorial explaining that. I found this package yesterday and already used it in a project with quite success, surprises aside,

Another one is that table_info() seems to return field names in order that sometimes (i.e. for certain tables) is not what src table looks like. This forced me to build INSERT statement using table_info from dest, which in turn was built using a hardcoded CREATE statement.

Thanks! Marcin

volcan01010 commented 1 year ago

Thanks, Marcin, I'm glad that you are finding the code useful. The documentation is the weak point now and I do have plans to update to a proper documentation site. There is a ticket about that here: https://github.com/BritishGeologicalSurvey/etlhelper/issues/12. You can note anything else that you find confusing or think should be better explained there.

I hadn't noticed that about table_info column name ordering. They are returned in the order that they come from the database. load might help you here - if you pass a dictionaries where they keys match the column names, it can write the INSERT statement for you.

In terms of dealing with data disagreements, you have a few options.

volcan01010 commented 1 year ago

I'm about to push a minor tweak to the current README to highlight that the table is not created automatically.

volcan01010 commented 1 year ago

Closed by #179