Open wubzz opened 6 years ago
what sort of numbers is lots of traffic? and what version of nodejs are you using?
I'm running Node 8.9.2. My app got the issue during a weekend, which is slightly less traffic. I would still say anywhere between 50-150 requests/min at a minimum. Each request in turn runs anywhere between 0-10 queries/request. It adds up to a lot of queries.. After that weekend I fixed it by rolling back knex/node-pool version so it has not happened since.
In addition to this, each tenant has its own database. This in turn means multiple instances of node-pool since there are multiple knex clients.
Another user of knex reported the following:
SELECT using knex repeatedly and monitored the heap usage after each query. Knex.js was burning anywhere from 700k to 2MB of heap per query, which crashed after a few hundred queries when it hit the node heap limit at around 1.5gb.
So perhaps it can be reproduced by spamming queries at a much larger scale.
I realize you're not getting a whole lot of information, and I apologize for that.
This might be a "fun" one to debug and probably all hinges on whats happening at runtime :-p couple more questions that have occurred to me ...
Whats the pool config you're using?
what version of generic-pool
were you using before you upgraded to 3.1.7
?
do you have any metrics from the pool
such as pool.size
, pool.available
, pool.borrowed
, pool.pending
?
Config:
connectionOptions.pool = {
max: dbCfg.poolMax, //Usually between 3-5 depending on cluster mode
min: 0,
idleTimeoutMillis: 5000,
evictionRunIntervalMillis: 1000,
Promise,
};
Prior to 3.1.7 we were using version 2.4.2. Unfortunately no metrics available, nothing was really being logged by default in production.
Ah I see, yes, there is pretty huge jump between version and just about everything changed.
FWIW the latest release (3.4.0
) fixed some internal bugs which may or may not have any bearing on your problem. I don't know for sure, but it's certainly possible one of those bugs could have been responsible for the huge RSS consumption by way of not releasing objects from a queue/list somewhere.
Your transaction rate alone shouldn't be enough to cause any problems I know it's definately being used for things in the region of 1K+ inflight requests and hundreds of txn/s, but I suspect cause of this lies in the specific traffic flow and things like number of inflight/waiting resource requests.
I'll try to find sometime in the next day or two to see if I can easily reproduce this with some synthetic data.
max: dbCfg.poolMax, //Usually between 3-5 depending on cluster mode
What does cluster mode mean here?
The cluster part is simply a dynamic limitation of how many connections the pool is allowed to create depending on if the app is running in cluster mode or not.
Without cluster: Max 10 With cluster: Math.ceil(10 / amount of forks)
This ensures that when scaling the app I keep the connections under the base postgres threshold of (default) 100 connections. It's a WIP solution.. :P
In 3.1.7 the lib seems to build up a lot of memory usage over time, eventually reaching enough RSS (which is never released) to crash an app. I originally reported this in https://github.com/tgriesser/knex/issues/2383 since I was not 100% sure the issue was in
generic-pool
or not, but now I am sure. Maybe related to #197 as well?Eventhough I have no test app to reproduce the issue I thought I should at least report the problem.
Edit: In my case it required a lot of traffic/sqls to reproduce in production. Not comfortable with further testing in production.