Closed BenderV closed 5 years ago
@BenderV How much data are you streaming in (records and maybe bytes) and how long does it run before it drops the connection?
@BenderV this is a pretty common problem. There are a lot of settings which come up between any shell command and a remote psql connection.
To @awm33's questions above, data which you can include which will greatly help us here:
target-postgres
which you are runningtarget-postgres
and your postgres
instance?statement_timeout
in your conf file?@BenderV 🏓 if you haven't seen this already. I think @awm33 and I have some ideas around how to make things better for folks as far as connections etc. go. If we don't hear back from you by the end of the week, I think we'll detail the improvements we think make the best sense broadly, then try and get those scheduled for fix. 👍 is a vote of confidence!
I'll let do your magic ;) I don't know how that will help you but here is my config. If you need me to test a new version, I'm happy to help
{
"postgres_host": "xxx",
"postgres_port": 5432,
"postgres_database": "xxx",
"postgres_username": "xxx",
"postgres_password": "xxx",
"postgres_schema": "data",
"disable_collection": true
}
version of:
target-postgres
which you are running
singer-target-postgres==0.1.1
python 3.6.5
I don't use psql ??
some stats around the size of stream in records and bytes (estimates are fine)
~ from 100 to 1000 records (1 to 10Mbits). I've tried to split the input file into multiple chunks (10), it worked but still failed many times
timing between you starting to run things and this error occurring (maybe more logs?)
~2-5 min
something detailing the schema
The source is pipedrive
: https://github.com/singer-io/tap-pipedrive/is there an ssh tunnel between the process running target-postgres
and your postgres
instance?
No
have you modified the default statement_timeout in your conf file? No
Interesting. That's super helpful! Thanks @BenderV.
I don't use psql ??
Sorry, I meant PostgreSQL, ie, what version is the server for you?
It's not necessary for us to move forward here, but if you get the chance and can provide some of the nested data as the json stream (cleansed of course) I can make a test to improve this functionality specifically.
It sounds like to me that we're creating the connection, we then do a bunch of denesting which takes a long long time, and that by the time we get back to our cursor/connection has forced a timeout and dropped with the server.
@BenderV we're up to Version 0.1.4
now. In 0.1.3
(I think?) we introduced a logging_level
flag which makes target-postgres
wayyyyyyy chattier and helps with (unsurprisingly) debugging. If you get a chance, can you bump to that version and rerun to see if the logs give any clearer indication as to the behaviour etc.? https://github.com/datamill-co/target-postgres/pull/92
Lastly, if you want to share logs and data more securely and have us do the cleansing etc., we can potentially arrange something to help you out
@AlexanderMann PostgreSQL 9.6.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
I'll get back with some logs of 0.1.4
!
@AlexanderMann Well, I have good news, and bad news.
I upgrade to 0.1.4
.
The good news is that I don't have the bug.
The bad news is that I can't reproduce it ahah
¯_(ツ)_/¯
I will notify you if I have it again. Thanks !
@BenderV woot! Glad to hear you're unblocked...though...I obviously wish it were not quantum in nature...
I'm going to close this for the time being, and if we get similar issues/you run into this again, hopefully we'll be in a better place to snag what's going on!
Hi, I'm trying to use target-postgres, but I'm having issue with it. It's seems that for large amount of data, the query is too long and the connection is dropped.
The only way to manually fix it would be to slip the input file, so to make the connections shorter. Is there another way around it ?