MeltanoLabs / tap-postgres

Singer Tap for PostgreSQL
https://hub.meltano.com/extractors/tap-postgres--meltanolabs/
Other
19 stars 23 forks source link

bug: `jsonschema.exceptions.ValidationError` when loading `information_schema` #437

Open ReubenFrankel opened 3 months ago

ReubenFrankel commented 3 months ago

Overview

Trying to debug this issue from Slack and getting a jsonschema.exceptions.ValidationError when loading information_schema views with target-jsonl:

2024-06-06 23:40:34,704 | INFO     | tap-postgres.information_schema-columns | Beginning full_table sync of 'information_schema-columns'...
2024-06-06 23:40:34,704 | INFO     | tap-postgres.information_schema-columns | Tap has custom mapper. Using 1 provided map(s).
Traceback (most recent call last):
  File "/home/reuben/Documents/taps/tap-postgres/.meltano/loaders/target-jsonl/venv/bin/target-jsonl", line 8, in <module>
    sys.exit(main())
  File "/home/reuben/Documents/taps/tap-postgres/.meltano/loaders/target-jsonl/venv/lib/python3.8/site-packages/target_jsonl.py", line 92, in main
    state = persist_messages(
  File "/home/reuben/Documents/taps/tap-postgres/.meltano/loaders/target-jsonl/venv/lib/python3.8/site-packages/target_jsonl.py", line 54, in persist_messages
    validators[o['stream']].validate((o['record']))
  File "/home/reuben/Documents/taps/tap-postgres/.meltano/loaders/target-jsonl/venv/lib/python3.8/site-packages/jsonschema/validators.py", line 130, in validate
    raise error
jsonschema.exceptions.ValidationError: 1 is not of type 'string', 'null'

Failed validating 'type' in schema['properties']['ordinal_position']:
    {'type': ['string', 'null']}

On instance['ordinal_position']:
    1
2024-06-06 23:40:35,979 | INFO     | singer_sdk.metrics   | METRIC: {"type": "timer", "metric": "sync_duration", "value": 1.2741827964782715, "tags": {"stream": "information_schema-columns", "context": {}, "status": "failed"}}
2024-06-06 23:40:35,979 | INFO     | singer_sdk.metrics   | METRIC: {"type": "counter", "metric": "record_count", "value": 49, "tags": {"stream": "information_schema-columns", "context": {}}}
2024-06-06 23:40:35,979 | ERROR    | tap-postgres.information_schema-columns | An unhandled error occurred while syncing 'information_schema-columns'
Traceback (most recent call last):
  File "/home/reuben/Documents/taps/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/streams/core.py", line 1190, in sync
    for _ in self._sync_records(context=context):
  File "/home/reuben/Documents/taps/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/streams/core.py", line 1113, in _sync_records
    self._write_record_message(record)
  File "/home/reuben/Documents/taps/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/streams/core.py", line 856, in _write_record_message
    self._tap.write_message(record_message)
  File "/home/reuben/Documents/taps/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/io_base.py", line 164, in write_message
    singer_write_message(message)
  File "/home/reuben/Documents/taps/tap-postgres/.meltano/extractors/tap-postgres/venv/lib/python3.8/site-packages/singer_sdk/_singerlib/messages.py", line 244, in write_message
    sys.stdout.flush()
BrokenPipeError: [Errno 32] Broken pipe

Reproduce

docker run --rm -e POSTGRES_HOST_AUTH_METHOD=trust -p 5432:5432 postgres
git clone git@github.com:MeltanoLabs/tap-postgres.git
cd tap-postgres
meltano install
meltano invoke tap-postgres | meltano invoke target-jsonl

Workaround

Deselect the information_schema streams (related to #54):

meltano select tap-postgres --exclude 'information_schema-*'
meltano select tap-postgres --all

Or just select the public schema:

meltano select tap-postgres 'public-*'
edgarrmondragon commented 3 months ago

imo information_schema should be skipped by default during schema inspection.