datamill-co / target-redshift

A Singer.io Target for Redshift
MIT License
23 stars 17 forks source link

Table comments breaking target #45

Open vitoravancini opened 4 years ago

vitoravancini commented 4 years ago

Hello everyone, I've been using target-postgres for almost an year and I'm now beggining to use target-redshift. Thank you for all the awesome work.

I'm having an issue when the target code tries to read the table comment metadata. I'm not sure why this is happening, and I was able to make it work changing two lines of code, maybe showing which lines I've changed and my json schema you could point me to something I'm getting wrong.

I've changed from file postgres.py around line 710

 comment = cur.fetchone()[0]

if comment:
    try:
        comment_meta = json.loads(comment)
    except:
        self.LOGGER.exception('Could not load table comment metadata')
        raise
else:
    comment_meta = None

to

comment = cur.fetchone()

if comment and len(comment) > 0:
    comment = comment[0]
    try:
        comment_meta = json.loads(comment)
    except:
        self.LOGGER.exception('Could not load table comment metadata')
        raise
else:
    comment_meta = None

and my json schema is

{ "streams": [{ "stream_alias": "batidas", "stream": "batidas", "tap_stream_id": "batidas", "schema": { "selected": true, "type": ["null", "object"], "properties": { "NSR": { "type": ["null", "string"] }, "NREP": { "type": ["null", "string"] }, "PIS": { "type": ["null", "string"] }, "Data": { "type": ["null", "string"] }, "Hora": { "type": ["null", "string"] }, "Latitude": { "type": ["null", "string"] }, "Longitude": { "type": ["null", "string"] } } }, "metadata": [{ "metadata": { "selected": true, "table-key-properties": ["NSR", "NREP", "PIS"] }, "breadcrumb": [] }] }] }

the error message is the following:

ERROR Exception writing records
Traceback (most recent call last):
  File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/postgres.py", line 302, in write_batch
    {'version': target_table_version})
  File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/sql_base.py", line 824, in write_batch_helper
    metadata)
  File "/home/vitor/projects/target-redshift/target_redshift/redshift.py", line 84, in upsert_table_helper
    log_schema_changes=log_schema_changes)
  File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/sql_base.py", line 400, in upsert_table_helper
    self.add_table(connection, table_path, table_name, _metadata)
  File "/home/vitor/projects/target-redshift/target_redshift/redshift.py", line 107, in add_table
    json_schema.make_nullable({'type': json_schema.BOOLEAN}))
  File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/postgres.py", line 727, in add_column_mapping
    metadata = self._get_table_metadata(cur, table_name)
  File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/postgres.py", line 713, in _get_table_metadata
    comment = cur.fetchone()[0]
TypeError: 'NoneType' object is not subscriptable
CRITICAL ('Exception writing records', TypeError("'NoneType' object is not subscriptable",))
Traceback (most recent call last):
  File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/postgres.py", line 302, in write_batch
    {'version': target_table_version})
  File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/sql_base.py", line 824, in write_batch_helper
    metadata)
  File "/home/vitor/projects/target-redshift/target_redshift/redshift.py", line 84, in upsert_table_helper
    log_schema_changes=log_schema_changes)
  File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/sql_base.py", line 400, in upsert_table_helper
    self.add_table(connection, table_path, table_name, _metadata)
  File "/home/vitor/projects/target-redshift/target_redshift/redshift.py", line 107, in add_table
    json_schema.make_nullable({'type': json_schema.BOOLEAN}))
  File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/postgres.py", line 727, in add_column_mapping
    metadata = self._get_table_metadata(cur, table_name)
  File "/home/vitor/projects/target-redshift/venv/lib/python3.6/site-packages/singer_target_postgres-0.2.4-py3.6.egg/target_postgres/postgres.py", line 713, in _get_table_metadata
    comment = cur.fetchone()[0]
TypeError: 'NoneType' object is not subscriptable

Thank you in advance!

vitoravancini commented 4 years ago

just now realized that this file is from target-postgres, not target-redshift.

AlexanderMann commented 4 years ago

@vitoravancini yup, that's expected. They are very related, built on the same codebase.

Is this a long lived table? ie, has it been syncing data for some time and this just now popped up? What kind of replication are you using? We haven't been seeing any issues with this code on our end, but just to be sure we could easily rerun the tests herein.

jo-carney commented 2 years ago

@vitoravancini @AlexanderMann Were you able to resolve this issue? My team is currently dealing with a similar problem