datamill-co / target-redshift

A Singer.io Target for Redshift
MIT License
23 stars 17 forks source link

Minor quality of life improvement #22

Closed teej closed 5 years ago

teej commented 5 years ago

Can this column name be a bit more terse? It being the first column and being super long makes exploratory SELECT *s out of the table annoying. Given the reliance on target-postgres and the performance characteristics of Redshift, I assume removing it or moving it to the end aren't simple changes to make.

https://github.com/datamill-co/target-redshift/blob/1dd1906784cbf2796927b381e091166ff3cd910e/target_redshift/redshift.py#L39

AlexanderMann commented 5 years ago

Hey @teej! Sorry for just now getting around to all this. For some reason GH isn't notifying me on this repo even though the settings are there 🤷‍♀

So you are correct that removing the column altogether, or moving it to the end aren't terribly simple solutions.

The reason this exists is because this repo depends on the logic for upserting tables present in https://github.com/datamill-co/target-postgres. This logic splits every operation necessary up into small bits. So there's create_table, add_column, make_column_nullable etc. The pro is that adding new targets to the upsert logic becomes pretty straightforward since each command is bite size. The downside to not batching all of these and optimizing based on the full inspected schema is that we don't know any columns to add to tables beforehand.

For postgres, this isn't a problem because you can have empty tables...Redshift demands that a table have at least one column, hence the CREATE_TABLE_INITIAL_COLUMN.

The rationale for making it a lengthy name is fist that it's easy to see it's a meaningless column, and more importantly, it's extremely unlikely that Singer will ever actually use a similar column (since we've technically stolen their namespacing with the _sdc_ prefix).

It might be worth looking at whether our call out to the upsert_table_helper could be wrapped at the end with some logic for:

The testing in the codebase should catch most of the edge cases for this, and if we followed the above logic, it shouldn't be the case that we'd have to opt folks in/out/create-a-migration.