Connection is dropped - Githubissues

BenderV commented 5 years ago

Hi, I'm trying to use target-postgres, but I'm having issue with it. It's seems that for large amount of data, the query is too long and the connection is dropped.

The only way to manually fix it would be to slip the input file, so to make the connections shorter. Is there another way around it ?

CRITICAL cursor already closed
Traceback (most recent call last):
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/postgres.py", line 111, in write_batch
    {'version': target_table_version})
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/sql_base.py", line 704, in write_batch_helper
    metadata)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/postgres.py", line 407, in write_table_batch
    {'version': remote_schema['version']})
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/sql_base.py", line 309, in upsert_table_helper
    table_name = self.add_table_mapping(connection, table_path, metadata)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/postgres.py", line 236, in add_table_mapping
    sql.Literal(root_table)))
psycopg2.OperationalError: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/__init__.py", line 26, in main
    target_tools.main(postgres_target)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/target_tools.py", line 25, in main
    stream_to_target(input_stream, target, config=config)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/target_tools.py", line 69, in stream_to_target
    raise e
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/target_tools.py", line 63, in stream_to_target
    _flush_streams(streams, target, force=True)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/target_tools.py", line 88, in _flush_streams
    _flush_stream(target, stream_buffer)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/target_tools.py", line 81, in _flush_stream
    target.write_batch(stream_buffer)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/postgres.py", line 117, in write_batch
    cur.execute('ROLLBACK;')
psycopg2.InterfaceError: cursor already closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".singer-postgres/bin/target-postgres", line 11, in <module>
    sys.exit(cli())
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/__init__.py", line 31, in cli
    main(args.config)
  File "/home/ben/project/data/singer/.singer-postgres/lib/python3.6/site-packages/target_postgres/__init__.py", line 26, in main
    target_tools.main(postgres_target)
psycopg2.InterfaceError: connection already closed

awm33 commented 5 years ago

@BenderV How much data are you streaming in (records and maybe bytes) and how long does it run before it drops the connection?

AlexanderMann commented 5 years ago

@BenderV this is a pretty common problem. There are a lot of settings which come up between any shell command and a remote psql connection.

To @awm33's questions above, data which you can include which will greatly help us here:

[ ] your configuration file with all of your connection information redacted
- ie, we don't need username, host, password, etc.
[ ] version of:
- target-postgres which you are running
- python you are running (looks like 3.6?)
- psql you're targetting
[ ] some stats around the size of stream in records and bytes (estimates are fine)
- curious about your batch sizes and whether we have some wiggle room to help you adjust those
[ ] timing between you starting to run things and this error occurring (maybe more logs?)
[ ] something detailing the schema
- If it's something severely nested that will have some impact here
[ ] is there an ssh tunnel between the process running target-postgres and your postgres instance?
[ ] have you modified the default statement_timeout in your conf file?

AlexanderMann commented 5 years ago

@BenderV 🏓 if you haven't seen this already. I think @awm33 and I have some ideas around how to make things better for folks as far as connections etc. go. If we don't hear back from you by the end of the week, I think we'll detail the improvements we think make the best sense broadly, then try and get those scheduled for fix. 👍 is a vote of confidence!

BenderV commented 5 years ago

I'll let do your magic ;) I don't know how that will help you but here is my config. If you need me to test a new version, I'm happy to help

your configuration file

{
  "postgres_host": "xxx",
  "postgres_port": 5432,
  "postgres_database": "xxx",
  "postgres_username": "xxx",
  "postgres_password": "xxx",
  "postgres_schema": "data",
  "disable_collection": true
}

version of:
- target-postgres which you are running singer-target-postgres==0.1.1
- python you are running (looks like 3.6?) python 3.6.5
- psql you're targetting I don't use psql ??
some stats around the size of stream in records and bytes (estimates are fine) ~ from 100 to 1000 records (1 to 10Mbits). I've tried to split the input file into multiple chunks (10), it worked but still failed many times
timing between you starting to run things and this error occurring (maybe more logs?) ~2-5 min
something detailing the schema
- If it's something severely nested that will have some impact her The source is pipedrive : https://github.com/singer-io/tap-pipedrive/
is there an ssh tunnel between the process running target-postgres and your postgres instance? No
have you modified the default statement_timeout in your conf file? No

AlexanderMann commented 5 years ago

Interesting. That's super helpful! Thanks @BenderV.

I don't use psql ??

Sorry, I meant PostgreSQL, ie, what version is the server for you?

It's not necessary for us to move forward here, but if you get the chance and can provide some of the nested data as the json stream (cleansed of course) I can make a test to improve this functionality specifically.

It sounds like to me that we're creating the connection, we then do a bunch of denesting which takes a long long time, and that by the time we get back to our cursor/connection has forced a timeout and dropped with the server.

@BenderV we're up to Version 0.1.4 now. In 0.1.3 (I think?) we introduced a logging_level flag which makes target-postgres wayyyyyyy chattier and helps with (unsurprisingly) debugging. If you get a chance, can you bump to that version and rerun to see if the logs give any clearer indication as to the behaviour etc.? https://github.com/datamill-co/target-postgres/pull/92

Lastly, if you want to share logs and data more securely and have us do the cleansing etc., we can potentially arrange something to help you out

BenderV commented 5 years ago

@AlexanderMann PostgreSQL 9.6.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit

I'll get back with some logs of 0.1.4 !

BenderV commented 5 years ago

@AlexanderMann Well, I have good news, and bad news.

I upgrade to 0.1.4. The good news is that I don't have the bug. The bad news is that I can't reproduce it ahah

¯_(ツ)_/¯

I will notify you if I have it again. Thanks !

AlexanderMann commented 5 years ago

@BenderV woot! Glad to hear you're unblocked...though...I obviously wish it were not quantum in nature...

I'm going to close this for the time being, and if we get similar issues/you run into this again, hopefully we'll be in a better place to snag what's going on!

datamill-co / target-postgres

Connection is dropped #91