Closed jesseahouser closed 2 years ago
2.2.2 contains only a commit that makes sure all connections are pinged within an idle_interval. You can see the initial report here: https://github.com/elixir-ecto/db_connection/pull/216
Are you sure that you need a pool of 40 connections? If they are not being used, it means that indeed you will be pinged every second. You can consider either increasing the idle interval or decreasing the pool size. We can also add an idle threshold configuration if you really believe the pool size is justified.
Also: awesome job on isolating the issue and great report!
@josevalim Thank you for your reply and suggestions. We have an update to report.
The following currently-available options were considered:
pool_size
This is not an attractive option for the use case as those resources are projected to be needed during peak time periods.idle_interval
This is a lower risk option for the use case. db_connection <= 2.2.1 (pinging a single idle connection per idle_interval
) was working well. The app’s connections to Postgres databases were not negatively impacted in the way that #216 described. We hypothesized that increasing the idle_interval
by an order of magnitude or two would not negatively impact performance and may achieve the intended effect of eliminating the dyno load spikes we observed with db_connection >= 2.2.2.idle_interval
from the default (1000 ms) to 100000 ms using an environment variable — similar to the way that we specify pool_size
. This had no discernable adverse effect.idle_interval
at 100000 ms, and monitored dyno load. Over the past three days, we have observed no dyno load spikes or negative performance impact. If dyno load spikes are still happening, they’re short-lived enough to be harmless.We’ll continue to monitor dyno load and performance, but thus far we have reasonable confidence that this solution (increase idle_interval
) meets our current needs. If an idle_limit
option is implemented in a future release per c1791c7, it would offer an additional level of control over pinging idle connections. Given our experience, we see this as a benefit.
Thank you again for your communication and contributions!
Issue Description
Following some dependency package upgrades, our team observed dyno load spikes a few times per day. Prior to these upgrades, we have experienced years of consistent, normal dyno load levels.
Context and Details
This Elixir app is hosted in a Heroku Private Space, and dyno load is their measure of CPU load (https://devcenter.heroku.com/articles/metrics#dyno-load). Dyno load maxes out for approximately 15-30 minutes with each spike, then returns to normal load on its own. We are running two web servers, and the spikes affect both, are unsynchronized, and are not dependent on traffic. This Elixir app maintains connections to three different Postgres databases with a pool size of 40 each.
To Reproduce
We began an investigation to narrow down which package(s) might be related to the issue. This revealed that the spikes:
We can reproduce this behavior by changing only the db_connection version to >= 2.2.2 in
mix.lock
.Screenshot