datamill-co / target-redshift

A Singer.io Target for Redshift
MIT License
23 stars 17 forks source link

Add "timestamp without time zone" as supported data type #35

Closed pcorbel closed 4 years ago

pcorbel commented 5 years ago

Amazon Redshift has a data type "timestamp without time zone" which isn't supported by the target yet (only "timestamp with time zone" is supported via target-postgres)

Data Type Aliases Description
TIMESTAMP TIMESTAMP WITHOUT TIME ZONE Date and time (without time zone)
TIMESTAMPTZ TIMESTAMP WITH TIME ZONE Date and time (with time zone)

Source: https://docs.aws.amazon.com/redshift/latest/dg/c_Supported_data_types.html

So when target-redshift scan the destination schema, the following error occurs:

File "/app/target/.venv/lib/python3.7/site-packages/target_postgres/postgres.py", line 782, in sql_type_to_json_schema
    raise PostgresError('Unsupported type `{}` in existing target table'.format(sql_type))
target_postgres.exceptions.PostgresError: Unsupported type `timestamp without time zone` in existing target table
AlexanderMann commented 5 years ago

@pcorbel does target-redshift/target-postgres create TIMESTAMP WITHOUT TIME ZONE columns somewhere?

Currently, the thinking is that the only supported tables are those that the target created.

pcorbel commented 5 years ago

@AlexanderMann does target-redshift/target-postgres create TIMESTAMP WITHOUT TIME ZONE columns somewhere? --> No

I agree that only supported tables are those that the datamill-co/target-redshift created. However, as of today, we need the totality of the tables in the schema to be created by datamill-co/target-redshift.

We can't have:

But only:

AlexanderMann commented 5 years ago

@pcorbel do you know whether you're getting these errors on just running the target, or is it when a stream you're processing tries to persist to a table created by some other means?

ie, is it:

  1. stream FOO tries to persist to remote table foo a table created and maintained by target-redshift
    • another table, bar, exists in the remote schema which has been created by another process
  2. stream FOO tries to persist to remote table foo which exists in the remote schema which has been created by another process

I ask, because I think the guidance has been that targets should probably be run in their own schemas and if they happen to work alongside other processes, that's great, but not a given. For target-redshift specifically, we've turned down trying to support tables not created and maintained by target-redshift.

The reason for this is that the stores a tonne of metadata into the remote table's COMMENT and has to use that for most operations.

pcorbel commented 5 years ago

@AlexanderMann It is the 1 case you mentioned. My usecase is that I would like to migrate some tables from Stitch to in-house extraction. To be able to do it in a transparent and smooth way, I would like my end users to use the same table name, in the the same schema name. It is just the extraction that would be different.

AlexanderMann commented 5 years ago

Fascinating. So @pcorbel to be clear, you are not looking to mix and match maintenance of tables between target-redshift and something else, but rather are looking to simply use it inside a schema which has other things in it?

If yes, I think we should identify where the code is blowing up by pulling in other stuffs first, as I can easily see this becoming a game of whack-a-mole with types etc.