Closed teej closed 5 years ago
Hey @teej! Sorry for just now getting around to all this. For some reason GH isn't notifying me on this repo even though the settings are there 🤷♀
So you are correct that removing the column altogether, or moving it to the end aren't terribly simple solutions.
The reason this exists is because this repo depends on the logic for upserting tables present in https://github.com/datamill-co/target-postgres. This logic splits every operation necessary up into small bits. So there's create_table
, add_column
, make_column_nullable
etc. The pro is that adding new targets to the upsert logic becomes pretty straightforward since each command is bite size. The downside to not batching all of these and optimizing based on the full inspected schema is that we don't know any columns to add to tables beforehand.
For postgres, this isn't a problem because you can have empty tables...Redshift demands that a table have at least one column, hence the CREATE_TABLE_INITIAL_COLUMN
.
The rationale for making it a lengthy name is fist that it's easy to see it's a meaningless column, and more importantly, it's extremely unlikely that Singer will ever actually use a similar column (since we've technically stolen their namespacing with the _sdc_
prefix).
It might be worth looking at whether our call out to the upsert_table_helper
could be wrapped at the end with some logic for:
The testing in the codebase should catch most of the edge cases for this, and if we followed the above logic, it shouldn't be the case that we'd have to opt folks in/out/create-a-migration.
Can this column name be a bit more terse? It being the first column and being super long makes exploratory
SELECT *
s out of the table annoying. Given the reliance ontarget-postgres
and the performance characteristics of Redshift, I assume removing it or moving it to the end aren't simple changes to make.https://github.com/datamill-co/target-redshift/blob/1dd1906784cbf2796927b381e091166ff3cd910e/target_redshift/redshift.py#L39