Connection issues after migrating to db_connection 2.0

elixir-ecto / db_connection

Database connection behaviour

http://hexdocs.pm/db_connection/DBConnection.html

306 stars 113 forks source link

Connection issues after migrating to db_connection 2.0 #177

Closed cjbell closed 5 years ago

cjbell commented 5 years ago

👋 hi all, wanted to report that since upgrading to db_connection 2.0.2 that we're seeing a large number of connection issues (~500 in < 24 hours). The error we're seeing in our logs is:

(DBConnection.ConnectionError) connection not available and request was dropped from queue after 127ms

The previous pool boy based queue used to give us a few connection drop issues, but they were few and far between. I know there's obviously a lot that might be our side and configuration focussed, but wanted to flag in case there's anything obvious here you could think of.

Mostly these issues come from our deferred queue consumers which are implemented as separate consumer supervisor processes, but that might be a red herring because honestly this is just the bulk of our db workload.

wojtekmach commented 5 years ago

Hey @cjbell, this seems related to another recent issue, check out https://github.com/elixir-ecto/ecto/issues/2833#issuecomment-440400022.

josevalim commented 5 years ago

Did you configure the pool_timeout before? If so, you may need to adapt the configs slightly as linked by Wojtek. :)

josevalim commented 5 years ago

I have improve the error message to point people to DBConnection.start_link/2 as well.

josevalim commented 5 years ago

Ping @cjbell :)

billgloff commented 5 years ago

I just upgraded to ecto 3.0.3 (db_connection 2.0.2) and now I'm receiving these same errors from Sentry every time quantum executes a job which polls and updates our Postgres db (which is still working btw!):

I tried setting the following but it was no help so far:

...
pool_size: 3,
# in microseconds, defaults to 50
queue_target: 200,
# in microseconds, defaults to 1000
queue_interval: 2000

These tables are in a staging env so they're small (< 4k records)

Any suggestions on what I can do here? Thanks in advance!

josevalim commented 5 years ago

Have you upgraded from which version? Also, why is the pool_size so low? If you are holding a connection for long because of any reason, then changing the queue config won’t help much. --

José Valimwww.plataformatec.com.br http://www.plataformatec.com.br/Founder and Director of R&D

billgloff commented 5 years ago

Was running ecto 2.2 before.

pool_size is low because I just have these 2 pollers that run every 60 seconds. I can try making it higher for testing purposes if you think that would help?

This is a staging environment and 99.999% of the time it's just querying and not updating anything so I would be surprised if it was holding the connection for a long time.

josevalim commented 5 years ago

@billgloff let's try with 4 pollers and then 8 to see if it changes things but if you only have two processes doing queries ever, then a pool of 2 should have been enough indeed.

billgloff commented 5 years ago

@josevalim I changed the pool_size first to 4 and then 8 and unfortunately I'm still seeing the same errors.

josevalim commented 5 years ago

That’s very weird. Do you think you can isolate it or reproduce it in any way so we can give it a try? Tks. --

José Valimwww.plataformatec.com.br http://www.plataformatec.com.br/Founder and Director of R&D

billgloff commented 5 years ago

@josevalim Seems like the issue was on my side as I had some rogue process running throwing these errors to Sentry which made it confusing when I was debugging the issue on my staging server. Anyways, everything's working now as it should and I thank you for helping me!

josevalim commented 5 years ago

Closing this for now. Please let us know if you have more info to reproduce it, thanks!