Closed marcelolebre closed 5 years ago
My guess would be that one of the sides are not able to open up too many connections that fast or something blocking such a high amount of connections on a short amount of time We would need a mechanism to reproduce this locally to be sure.
Also, keep in mind that having too many connection like that usually ends-up with worse performance, because it means less caching and resource sharing at the same it utilizes more cpu and memory. Unless you are using a machine with a couple dozens of cores, that amount of connections seems excessive.
3,6 GHz Intel Core i9 - 8 cores 32GM RAM MacOS
That's why I increased shared_buffers, though I didn't change kernel max segment size.
I agree, it does feel more like a physical-related issue. CPU and Memory stress are OK during the test, no starvation there.
Also agree on the performance deterioration part, although it's a diminishing returns kind of scenario, it'll either show up due to resource starvation or shear concurrency overhead of establishing the initial connection - I'd bet this last part is the real issue here.
@marcelolebre, I will go ahead and close this issue, as I don't believe there is something we can act on here, as the DB is the one closing connections on this case. If you know of a reason that it is actually us forcing them DB to close them, please let us know!
I'm currently running some stress tests for a production migration and have been tinkering with DB configs for throughput optimisation.
This is the result of a mix task in a phoenix app.
With the current configs several errors are displayed:
And a few seconds later other connections start to pickup and work resumes/starts as expected but for some reason several dozens of these error messages show up when the task is started.
This test runs for half an hour and these errors only show up at the beginning, no errors are shown in-flight and the test completes with success, the migration ends successfully.
Here are my current configs:
elixir -v
Dependencies
Phoenix configs (I purposefully increased queue times):
I noticed that if I reduce the pool_size to 100, no errors are displayed.
Postgres configs
version: 11.3
Latest log entry in Postgres (no relevant insights found):