datamill-co / target-redshift

A Singer.io Target for Redshift
MIT License
23 stars 17 forks source link

connection closing before write #33

Closed rthirumulu closed 4 years ago

rthirumulu commented 5 years ago

Currently using tap-facebook from singer.io to read from facebook marketing api and writing to Redshift using the target-redshift package. The connection establishes with Redshift and reads the data, but before i get to the write step I'm getting the error:

Traceback (most recent call last):
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 129, in write_batch
    cur.execute('BEGIN;')
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 53, in execute
    return super(_MillisLoggingCursor, self).execute(query, vars)
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/psycopg2/extras.py", line 440, in execute
    return super(LoggingCursor, self).execute(query, vars)
psycopg2.OperationalError: SSL SYSCALL error: EOF detected

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_redshift/__init__.py", line 48, in main
    target_tools.main(redshift_target)
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 25, in main
    stream_to_target(input_stream, target, config=config)
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 69, in stream_to_target
    raise e
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 63, in stream_to_target
    _flush_streams(streams, target, force=True)
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 88, in _flush_streams
    _flush_stream(target, stream_buffer)
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/target_tools.py", line 81, in _flush_stream
    target.write_batch(stream_buffer)
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_redshift/redshift.py", line 58, in write_batch
    return PostgresTarget.write_batch(self, nullable_stream_buffer)
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 192, in write_batch
    cur.execute('ROLLBACK;')
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 53, in execute
    return super(_MillisLoggingCursor, self).execute(query, vars)
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/psycopg2/extras.py", line 440, in execute
    return super(LoggingCursor, self).execute(query, vars)
psycopg2.InterfaceError: cursor already closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "venv--target-redshift/bin/target-redshift", line 10, in <module>
    sys.exit(cli())
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_redshift/__init__.py", line 54, in cli
    main(args.config)
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_redshift/__init__.py", line 48, in main
    target_tools.main(redshift_target)
psycopg2.InterfaceError: connection already closed
AlexanderMann commented 5 years ago

@rthirumulu can you do a few things for us:

I ask for this because it looks to me like you're running into timeouts with your connection specifically:

psycopg2.OperationalError: SSL SYSCALL error: EOF detected

Looks related to: https://github.com/psycopg/psycopg2/issues/533

The issue there appeared to be a large SQL statement causing issues. We have long query logging in our DEBUG logs, along with have timings so we can see whether this is happening quickly or whether we see trying to issue a COMMIT results in this error popping up after a long time, etc. etc.

rthirumulu commented 5 years ago

@AlexanderMann:

Output:

DEBUG MillisLoggingConnection: 0 millis spent executing: b'BEGIN;'
DEBUG MillisLoggingConnection: 0 millis spent executing: b'BEGIN;'
CRITICAL cursor already closed
Traceback (most recent call last):
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 234, in write_batch
    cur.execute('BEGIN;')
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/target_postgres/postgres.py", line 67, in execute
    return super(_MillisLoggingCursor, self).execute(query, vars)
  File "/singer.io/venv--target-redshift/lib/python3.7/site-packages/psycopg2/extras.py", line 448, in execute
    return super(LoggingCursor, self).execute(query, vars)
psycopg2.OperationalError: SSL SYSCALL error: EOF detected

Here's the structure of my config file. The documentation didn't specify anything related to SSL indicators, am I missing something here?

{
  "redshift_host": "<>",
  "redshift_port": <>,
  "redshift_database": "<>",
  "redshift_username": "<>",
  "redshift_password": "<>",
  "redshift_schema": "<>",
  "logging_level": "DEBUG",
  "default_column_length": 1000,
  "target_s3": {
    "aws_access_key_id": "<>",
    "aws_secret_access_key": "<>",
    "bucket": "<>",
    "key_prefix": "<>"
  }
}

I agree that it seems to be a timeout connection. Current process:

  1. Establish connection to Redshift
  2. Read data from Facebook API
  3. Attempt to write data

Is there a workaround to establish the connection after reading the data? Or at least re-establish the connection again once it's been disconnected before the write step?

AlexanderMann commented 5 years ago

@rthirumulu looks like we don't have the ability to pass along SSL information currently. target-postgres has the ability to pass along SSL information to allow psycopg2 to deal with certs and whatnot.

I think we've avoided doing this for target-redshift because we assumed most folks would be using an ssh bastion to connect to their Redshift instance.

If you can provide guidance on what values are needed to get PsycoPG2 to connect to Redshift with SSL support (if that is indeed the issue you're facing) we'd be happy to get that baked into target-redshift.

Do you have any other projects connecting to this Redshift instance via python?

AlexanderMann commented 5 years ago

@rthirumulu we're looking to wrap up features/whatnot for a release this week. Any updates on the above?

awm33 commented 4 years ago

We never quite identified this as a reproducible issue with the target, so I'm closing this out.