Open filipedeschamps opened 2 years ago
It's the lambda environment freeze / thaw causing this issue. You can't use a long lived connection pool with a lambda as there is no guarantee that the lambda will continue running to clear the expired connections or keep the underlying TCP socket alive.
See this thread and the nested linked issue for more details: https://github.com/brianc/node-postgres/issues/2243
Hi @sehrope thank you! I read the #2243 issue before creating mine, and my app is not suffering from "connection timeout".
But you pointed out again the freeze / thaw behavior, and I'm "stress" testing my app to keep all lambdas warm, for example:
ab -n 1000 -c 100 -l https://xxx
And at some point (near the 900 request) everything hangs, because there's no more available connection, and the tryToGetNewClientFromPool()
keeps trying until the lambda times out.
But then I tried to create the Pool with 1
for the idle:
idleTimeoutMillis: 1,
It was way slower, but at the end, all 1000 request where served and no connections where hanging (tbh 5 of them was, I don't know why).
So know I'm wondering: if it's the freeze condition the problem, how fast does this happen to a lambda?
Additional note:
idleTimeoutMillis: 100,
Consumes all connections again without recovery/closing connections any more.
Maybe some lambdas aren't being frozen because they became idle, but because they were forcibly frozen to make room for other new lambdas in a high load scenario?
[edit] my urge to use Pools is because creating a new connection every time or reusing one from the Pool, is a difference between 80ms and 1.5ms.
I think that's possible as the docs say it could be frozen or bounced at any time:
https://docs.aws.amazon.com/lambda/latest/dg/runtimes-context.html
The last line also says not to assume you have connections open across requests:
When you write your function code, do not assume that Lambda automatically reuses the execution environment for subsequent function invocations. Other factors may dictate a need for Lambda to create a new execution environment, which can lead to unexpected results, such as database connection failures.
If there's multiple copies of the lambda running or if AWS spins up new ones, it's possible that the server's max connection limit is reached and a newly created lambda cannot open any new connections until the old ones on other lambdas are closed. And they might not actually be gracefully closed if the lambda is frozen so it'd hang until the server finally drops them due to TCP errors.
...cannot open any new connections until the old ones on other lambdas are closed. And they might not actually be gracefully closed if the lambda is frozen so it'd hang until the server finally drops them due to TCP errors.
Yes. What really bugs me is that, even using idleTimeoutMillis: 10
they aren't gracefully closed. Those lambdas weren't in an idle state, they are being shot down. If only I could see this information somewhere...
So @sehrope just to map out all scenarios and double check if I have the right models in my mind:
1 request from User to App, in which it makes 3 travels (queries) from App to Database (AWS RDS tiniest tier with 82
max connections) and the App hosted by Vercel (AWS Lambas under the hood).
For every single query, create a new client to open a new connection, run the query, and right afterward destroy the client and close the connection.
"latency": {
"first_query": 60.39779100008309,
"second_query": 62.02611799980514,
"third_query": 87.15444500022568
},
✅ No connection left hanging under low load. ✅ No connection left hanging under high load:
$ ab -n 1000 -c 100 -l https://xxx
# > Time taken for tests: 16.601 seconds
For 1 request, open just one connection and reuse it through the subsequent queries in the request, and in the end close it. It's safe, but less safe than Scenario 1 because you need to close the Database connections by yourself and in all code branches, for example:
But it pays of I guess:
"latency": {
"first_query": 58.97044100007042,
"second_query": 2.284815000137314,
"third_query": 0.9723079998511821
},
✅ No connection left hanging under low load. ✅ No connection left hanging under high load. ✅ 2.5x faster than Scenario 1:
$ ab -n 1000 -c 100 -l https://xxx
# > Time taken for tests: 6.699 seconds
For 1 request, use Pool declared in the global scope to manage all connected Clients (create and reuse). You don't need to close Clients by yourself, the Pool does this once idleTimeoutMillis
is reached, BUT, in a Lambda environment, you can't guarantee the code will be running for the Pool be able to run its internal timers and gracefully close the connections. This problem is not apparent under low load, but under high load it is.
First request:
"latency": {
"first_query": 61.41757299995515,
"second_query": 3.38494899997022,
"third_query": 1.2718850000528619
},
Second request made before reaching idleTimeoutMillis
(notice first query):
"latency": {
"first_query": 1.6486390000209212,
"second_query": 1.6974199999822304,
"third_query": 1.25238800002262
},
✅ No connection left hanging under low load. ❌ All connections left hanging under high load, test could not be completed:
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
apr_pollset_poll: The timeout specified has expired (70007)
Total of 988 requests completed
Maybe you could kinda sync idleTimeoutMillis
with the new idle_session_timeout
available on Postgres 14
, but is still risky.
Always keep on global scope an open connection. It's faster because there's zero overhead (important to note that the Pool overhead is minimal, and it can manage multiple Clients), but it is one level unsafe, because of all problems from Scenario 3, plus Postgres closing your connection from its side, just like mentioned in the issue #2243
First request:
"latency": {
"first_query": 56.0580700000282377,
"second_query": 1.2427840000018477,
"third_query": 0.8340720001142472
},
Second request:
"latency": {
"first_query": 0.8882809999631718,
"second_query": 0.9511439999332651,
"third_query": 0.5543670000042766
},
✅ No connection left hanging under low load. ❌ All connections left hanging under high load, test could not be completed:
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
apr_pollset_poll: The timeout specified has expired (70007)
Total of 984 requests completed
In a load of 5000 requests (50 concurrent), I received 22 message: 'Client was closed and is not queryable',
Anyway, differences between Scenario 2
and Scenario 4
under this load:
Concurrency Level: 50
Time taken for tests: 32.306 seconds
Complete requests: 5000
Concurrency Level: 50
Time taken for tests: 22.546 seconds
Complete requests: 5000
Alternatively you could create a new pool for every request, and clean it up every time. This makes the error handling somewhat easier.
Did you handle errors on client.connect()
in scenario 2? If the connection timed out (connectionTimeoutMillis
) under high load, you will get these errors when you try to use the client.
Alternatively you could create a new pool for every request, and clean it up every time. This makes the error handling somewhat easier.
True! I will combine this strategy with idle_session_timeout
from Postgres 14
because if the Pool can't clean itself from a frozen lambda, Postgres from its side can.
Did you handle errors on
client.connect()
in scenario 2? If the connection timed out (connectionTimeoutMillis
) under high load, you will get these errors when you try to use the client.
I didn't and it's a great point, but are you sure a "waiting to open" connection will throw this message? I thought it was due to the fact a request called client.end()
, flagging the client to _closing = true
and then a subsequent request found this still existing object and tried to use it:
What I could do is to handle conditions for this.activeQuery
and !this._queryable
but I would start to mess with private properties.
In a load of 5000 requests (50 concurrent), I received 22 message: 'Client was closed and is not queryable',
That sounds like a bug in your either the pool itself or your pool connection management. Regardless of load, the pool should not close connections that are checked out and should not hand out connections it knows that it closed.
I didn't and it's a great point, but are you sure a "waiting to open" connection will throw this message? I thought it was due to the fact a request called client.end(), flagging the client to _closing = true and then a subsequent request found this still existing object and tried to use it:
Are you calling client.end()
on a pool connection? You should be calling .client.release(true)
(or anything "truthy" like an Error) to inform the pool that the connection is bad and should be evicted.
Does your code use pool.query(...)
or do you manually manage connections via pool.connect(...)
(i.e. for transaction management)?
@sehrope on Scenario 2 I wasn't using a Pool, I was manually creating one client, assigning it to the global scope and for every new request, checking if already existed, if so, reuse it, otherwise create a new one, and only at the end of that request, closing the client and redefining its variable to null
. So great chances of race condition I guess.
Oh yes that'd definitely break under load. The client's have an internal query command queue so concurrent usage of client kind of works until it doesn't (i.e. anything that requires connection state management like a transaction). In practice it's a terrible idea to rely on the queue behavior and sharing client's concurrently is a big no-no.
How about having your request handler get or create a pool at the start of the request and if it's the only active request upon completion, it destroys the pool? Something like this (not tested, just a skeleton!):
let pool = null;
let poolUsageCount = 0;
// App code should use this to get the pool rather than referencing the variable directly:
export const getPool = () => {
if (!pool) {
throw new Error('Called getPool() outside of an active request');
}
return pool;
}
exports.handler = async function(event, context) {
if (!pool) {
pool = new pg.Pool();
poolUsageCount += 1;
}
try {
await myRealHandler(event, context);
} finally {
poolUsageCount -= 1;
if (poolUsageCount === 0) {
pool.end();
pool = null;
}
}
}
Under load there would be enough concurrent active requests that the pool does not get destroyed. It'd be reused by concurrent requests and any open connections held in the pool would be reused as well.
But if the request count ever drops low enough that there is nothing running, it would ensure the pool is destroyed and recreated in case the lamda is frozen / thawed before the next request.
You might have to tweak it a bit more to have the top level request handler destroy the pool before sending back the response to ensure it all happens before AWS thinks it's okay to scrap the lamda.
@sehrope great!! I'm avoiding having the controller manage the pool, but I'm not discarding this strategy, thank you 🤝
Meanwhile, I started my tests with idle_session_timeout
and it really does what it promises: Postgres ends any idle
connection automatically. So I've reimplemented Pool sharing with:
Pool:
idleTimeoutMillis: 5000
Postgres:
idle_session_timeout = 7000
While under 7000ms everything looks great at every request, after passing 7000, the next request blew up in my face:
terMark":16384,"buffer":{"head":null,"tail":null,"length":0},"length":0,"pipes":[],"flowing":true,"ended":false,"endEmitted":false,"reading":true,"sync":false,"needReadable":true,"emittedReadable":false,"readableListening":false,"resumeScheduled":false,"errorEmitted":false,"emitClose":false,"autoDestroy":false,"destroyed":true,"errored":null,"closed":true,"closeEmitted":false,"defaultEncoding":"utf8","awaitDrainWriters":null,"multiAwaitDrain":false,"readingMore":false,"dataEmitted":true,"decoder":null,"encoding":null},"_events":{"error":[null,null],"close":[null,null]},"_eventsCount":4,"_writableState":{"objectMode":false,"highWaterMark":16384,"finalCalled":false,"needDrain":false,"ending":false,"ended":false,"finished":false,"destroyed":true,"decodeStrings":false,"defaultEncoding":"utf8","length":0,"writing":false,"corked":0,"sync":false,"bufferProcessing":false,"writecb":null,"writelen":0,"afterWriteTickInfo":null,"buffered":[],"bufferedIndex":0,"allBuffers":true,"allNoop":true,"pendingcb":0,"prefinished":false,"errorEmitted":false,"emitClose":false,"autoDestroy":false,"errored":null,"closed":true,"closeEmitted":false,"writable":true},"allowHalfOpen":false,"_sockname":null,"_pendingData":null,"_pendingEncoding":"","server":null,"_server":null},"_host":null,"_readableState":{"objectMode":false,"highWaterMark":16384,"buffer":{"head":null,"tail":null,"length":0},"length":0,"pipes":[],"flowing":true,"ended":false,"endEmitted":false,"reading":false,"sync":false,"needReadable":true,"emittedReadable":false,"readableListening":false,"resumeScheduled":false,"errorEmitted":false,"emitClose":false,"autoDestroy":false,"destroyed":true,"errored":null,"closed":true,"closeEmitted":false,"defaultEncoding":"utf8","awaitDrainWriters":null,"multiAwaitDrain":false,"readingMore":false,"dataEmitted":true,"decoder":null,"encoding":null},"_writableState":{"objectMode":false,"highWaterMark":16384,"finalCalled":false,"needDrain":false,"ending":false,"ended":false,"finished":false,"destroyed":true,"decodeStrings":false,"defaultEncoding":"utf8","length":5,"writing":true,"corked":0,"sync":false,"bufferProcessing":false,"writelen":5,"afterWriteTickInfo":null,"buffered":[],"bufferedIndex":0,"allBuffers":true,"allNoop":true,"pendingcb":1,"prefinished":false,"errorEmitted":false,"emitClose":false,"autoDestroy":false,"errored":null,"closed":true,"closeEmitted":false},"allowHalfOpen":false,"_sockname":null,"_pendingData":null,"_pendingEncoding":"","_server":null,"ssl":null,"_requestCert":true,"_rejectUnauthorized":false},"_keepAlive":false,"_keepAliveInitialDelayMillis":0,"lastBuffer":false,"parsedStatements":{},"ssl":{"rejectUnauthorized":false},"_ending":true,"_emitMessage":false,"_connecting":true},"queryQueue":[],"binary":false,"processID":17622,"secretKey":543429530,"ssl":{"rejectUnauthorized":false},"_connectionTimeoutMillis":0,"_connectionCallback":null,"activeQuery":null,"readyForQuery":true,"hasExecuted":true,"_poolUseCount":57},"stack":["error: terminating connection due to idle-session timeout"," at Parser.parseErrorMessage (/var/task/node_modules/pg-protocol/dist/parser.js:287:98)"," at Parser.handlePacket (/var/task/node_modules/pg-protocol/dist/parser.js:126:29)"," at Parser.parse (/var/task/node_modules/pg-protocol/dist/parser.js:39:38)"," at TLSSocket.<anonymous> (/var/task/node_modules/pg-protocol/dist/index.js:11:42)"," at TLSSocket.emit (events.js:400:28)"," at addChunk (internal/streams/readable.js:293:12)"," at readableAddChunk (internal/streams/readable.js:267:9)"," at TLSSocket.Readable.push (internal/streams/readable.js:206:10)"," at TLSWrap.onStreamRead (internal/stream_base_commons.js:188:23)"]}
[ERROR] [1646924053876] LAMBDA_RUNTIME Failed to post handler success response. Http response code: 400.
END RequestId: 6265b8a0-62d2-45bb-afee-0bdc371917e0
REPORT RequestId: 6265b8a0-62d2-45bb-afee-0bdc371917e0 Duration: 6.02 ms Billed Duration: 7 ms Memory Size: 1024 MB Max Memory Used: 54 MB
RequestId: 6265b8a0-62d2-45bb-afee-0bdc371917e0 Error: Runtime exited with error: exit status 129
Runtime.ExitError
The most important part:
"stack":["error: terminating connection due to idle-session timeout","
at Parser.parseErrorMessage (/var/task/node_modules/pg-protocol/dist/parser.js:287:98)","
at Parser.handlePacket (/var/task/node_modules/pg-protocol/dist/parser.js:126:29)","
It's more clear to me that in a Lambda, the Pool stops its internal work as soon as the request is returned, so it's unable to disconnect the client after 5000
and then probably tries to do it right after that last request hits the Lambda, but the Postgres backend already killed the connection at this point.
Btw, there is no concurrent client sharing in this case (and the poolUsageCount
would always be 0 at the end of @sehrope's handler), because the AWS Lambda will only send a single request at a time to each instance.
I didn't and it's a great point, but are you sure a "waiting to open" connection will throw this message? I thought it was due to the fact a request called client.end(), flagging the client to _closing = true and then a subsequent request found this still existing object and tried to use it:
My mistake, the connection timeout sets the client.connection._ending
flag, not the client._ending
:
https://github.com/brianc/node-postgres/blob/21ccd4f1b6e66774bbf24aecfccdbfe7c9b49238/packages/pg/lib/client.js#L103-L107
In this case, I don't think connection timeout would result in that exact error message. If you don't use pool, that error should only be triggered by using a client after calling client.end()
. The connection timeout might result in Connection terminated unexpectedly
errors.
@boromisp Oh dang you're right. Lamda only sends a single request at a time to an instance.
@filipedeschamps Forget my idea of keeping it open it if it's concurrently being used. That would only work if the requests are actually concurrent.
I think I found an almost sweet spot, it's not the fastest strategy because you're hit in the first query (to open connection), but the Pool reuses the client to every subsequent query in that request and closes the client right after. So the pool is using:
idleTimeoutMillis: 1
max: 1
With idle_session_timeout = 7000
set on Postgres to kill anything that got out of control.
First query result
"latency": {
"first_query": 13.636365999933332,
"second_query": 2.0373400000389665,
"third_query": 0.6963150000665337
},
Load results 1000 requests
Concurrency Level: 100
Time taken for tests: 7.362 seconds
Complete requests: 1000
Failed requests: 0
Load results 5000 requests
Concurrency Level: 100
Time taken for tests: 29.019 seconds
Complete requests: 5000
Failed requests: 0
No connections left hanging. But I'm confident it will not work for every load, because of this:
Pool:
idleTimeoutMillis: 100
Concurrency Level: 100
Time taken for tests: 51.840 seconds
Complete requests: 5000
Failed requests: 0
Non-2xx responses: 18
It's clear to me that the Pool internal timers completely freeze right after the request is returned.
Next, I'm going to try setting Postgres to idle_session_timeout = 65000
because Lambdas default timeout is 60000
and after this, it doesn't matter what it could do... connections impossible to be reached will be left hanging in idle
.
To my surprise, somehow even after 60 seconds, the Lambda keeps the socket connection open.
Pool:
idleTimeoutMillis: 10000
Postgres:
idle_session_timeout = 61000
First request works as expected, and 2 minutes later, sending the second request makes the app crash with "error: terminating connection due to idle-session timeout"
[edit] also crashed after 5 minutes.
For now, my final choice will be:
Pool
idleTimeoutMillis: 1
Postgres
idle_session_timeout = 0
Load of 5.000 requests (100 concurrent)
Time taken for tests: 30.627 seconds
Percentage of the requests served within a certain time (ms)
50% 540
66% 567
75% 593
80% 609
90% 681
95% 789
98% 1881
99% 2157
100% 4318 (longest request)
Connections left hanging
11
One request example
"latency": {
"first_query": 19.316103999968618,
"second_query": 2.637129999930039,
"third_query": 1.5327200000174344
},
In comparison, if I open one new connection for every query of the request:
Load of 5.000 requests (100 concurrent)
Time taken for tests: 78.632 seconds
Percentage of the requests served within a certain time (ms)
50% 1509
66% 1547
75% 1580
80% 1608
90% 1699
95% 1796
98% 2600
99% 3009
Connections left hanging
0
One request example
"latency": {
"first_query": 19.859559000004083,
"second_query": 17.381792000029236,
"third_query": 18.042578999884427
},
Final solution (already working in production) is to use Pool as much as you can, but keep checking "opened connections" versus "available connections" and if they're too close, begins to end()
the Pool on finally
instead of just release()
th client.
My repository is still private, but here's the full solution:
// database.js - exports a Singleton with `query()` and `getNewClient()` methods
import { Pool, Client } from 'pg';
import retry from 'async-retry';
import { ServiceError } from 'errors/index.js';
const configurations = {
user: process.env.POSTGRES_USER,
host: process.env.POSTGRES_HOST,
database: process.env.POSTGRES_DB,
password: process.env.POSTGRES_PASSWORD,
port: process.env.POSTGRES_PORT,
connectionTimeoutMillis: 5000,
idleTimeoutMillis: 30000,
max: 1,
ssl: {
rejectUnauthorized: false,
},
allowExitOnIdle: true,
};
// https://github.com/filipedeschamps/tabnews.com.br/issues/84
if (['test', 'development'].includes(process.env.NODE_ENV) || process.env.CI) {
delete configurations.ssl;
}
const cache = {
pool: null,
maxConnections: null,
reservedConnections: null,
openedConnections: null,
openedConnectionsLastUpdate: null,
};
async function query(query, params) {
let client;
try {
client = await tryToGetNewClientFromPool();
return await client.query(query, params);
} catch (error) {
const errorObject = new ServiceError({
message: error.message,
context: {
query: query.text,
},
errorUniqueCode: 'INFRA:DATABASE:QUERY',
stack: new Error().stack,
});
console.error(errorObject);
throw errorObject;
} finally {
if (client) {
const tooManyConnections = await checkForTooManyConnections(client);
if (tooManyConnections) {
client.release();
await cache.pool.end();
cache.pool = null;
} else {
client.release();
}
}
}
}
async function tryToGetNewClientFromPool() {
const clientFromPool = await retry(newClientFromPool, {
retries: 50,
minTimeout: 0,
factor: 2,
});
return clientFromPool;
async function newClientFromPool() {
if (!cache.pool) {
cache.pool = new Pool(configurations);
}
return await cache.pool.connect();
}
}
async function checkForTooManyConnections(client) {
const currentTime = new Date().getTime();
const openedConnectionsMaxAge = 10000;
const maxConnectionsTolerance = 0.9;
if (cache.maxConnections === null || cache.reservedConnections === null) {
const [maxConnections, reservedConnections] = await getConnectionLimits();
cache.maxConnections = maxConnections;
cache.reservedConnections = reservedConnections;
}
if (
!cache.openedConnections === null ||
!cache.openedConnectionsLastUpdate === null ||
currentTime - cache.openedConnectionsLastUpdate > openedConnectionsMaxAge
) {
const openedConnections = await getOpenedConnections();
cache.openedConnections = openedConnections;
cache.openedConnectionsLastUpdate = currentTime;
}
if (cache.openedConnections > (cache.maxConnections - cache.reservedConnections) * maxConnectionsTolerance) {
return true;
}
return false;
async function getConnectionLimits() {
const [maxConnectionsResult, reservedConnectionResult] = await client.query(
'SHOW max_connections; SHOW superuser_reserved_connections;'
);
return [
maxConnectionsResult.rows[0].max_connections,
reservedConnectionResult.rows[0].superuser_reserved_connections,
];
}
async function getOpenedConnections() {
const openConnectionsResult = await client.query(
'SELECT numbackends as opened_connections FROM pg_stat_database where datname = $1',
[process.env.POSTGRES_DB]
);
return openConnectionsResult.rows[0].opened_connections;
}
}
async function getNewClient() {
try {
const client = await tryToGetNewClient();
return client;
} catch (error) {
const errorObject = new ServiceError({
message: error.message,
errorUniqueCode: 'INFRA:DATABASE:GET_NEW_CONNECTED_CLIENT',
stack: new Error().stack,
});
console.error(errorObject);
throw errorObject;
}
}
async function tryToGetNewClient() {
const client = await retry(newClient, {
retries: 50,
minTimeout: 0,
factor: 2,
});
return client;
// You need to close the client when you are done with it
// using the client.end() method.
async function newClient() {
const client = new Client(configurations);
await client.connect();
return client;
}
}
export default Object.freeze({
query,
getNewClient,
});
Open and close connection for every single query:
Time taken for tests: 78.256 seconds
Manage Pool based on opened vs available connections:
Time taken for tests: 20.968 seconds
@filipedeschamps Thank you so much for investigating this issue. had the same problem and am quite happy with the following solution:
const MAX_SIGNED_32_BIT_INTEGER = 2147483647;
...
{
idleTimeoutMillis: MAX_SIGNED_32_BIT_INTEGER,
reapIntervalMillis: MAX_SIGNED_32_BIT_INTEGER,
}
...
This configuration effectively deactivates the reap cycle in tarn.js, meaning that idle connections are never cleaned up, regardless of whether the lambda process is hot or cold. Given the operational characteristics of lambdas, server-side cleanup does not work reliably, so disabling it seems sensible.
Additionally, I set the idle_session_timeout
in PostgreSQL to 30,000 milliseconds (30 seconds) to cleanup idle connections on the server side (PostgreSQL Documentation).
I also implemented graceful shutdown listeners in the lambda to ensure all remaining connections are cleaned up before the process is terminated, although this step seems optional since the server would clean these up after 30 seconds anyway. More details can be found in this Stack Overflow discussion.
A good option is also to look into RDS Proxy to achieve an external db connection pool.
Instead of deactivating the application side db connection pool completely I went with setting the value of the database idle timeout slightly higher. The full approach is described here: https://blog.stefanwaldhauser.me/posts/lambda_db_connection_leak/
In an AWS Lambda setup using application-side database connection pooling, there’s a risk of connection leaks due to frozen lambda processes not cleaning up idle database connections. To address this, configure the PostgreSQL idle_session_timeout parameter slightly higher than the application-side pool’s idleTimeoutMillis. Also, manually check if your application-side pool contains any idle connections when the lambda process is unfrozen and another execution occurs, to ensure you are not using a database connection that has been terminated by the database while the process was frozen.
to ensure you are not using a database connection that has been terminated by the database while the process was frozen
Yes! I've had a hard time with this, since pool
doesn't automatically handle this case for you, unfortunately 🤝
I was skeptical to write this issue since it might involve other things, not just
pg
, but I will give it a try:So, everything works as expected on
localhost
: after I release a client to the Pool withclient.release()
andidleTimeoutMillis
reaches its limit, my Pool correctly emitsremove
event (signaling that it correctly destroyed the client and also the real connection to the Database). Also, I can use my localhost to connect to the same Database I'm using in Production (RDS Postgres) and it works as expected.But on environments like Vercel (which uses AWS Lambas under the hood), for some very curious reason,
idleTimeoutMillis
seems to not work as expected. Looks like after releasing a Client, it stays in the Pool asidle
forever, not destroying the client/connection after reachingidleTimeoutMillis
and leaving a lot ofidle
connections hanging on RDS.To make things more strange, if I force a client to close using
client.release(true)
, theremove
event from the Pool is emitted and the connection is closed as expected... but forcing this to every client ruins the purpose of using a Pool in the first place.I don't know if there's some different behavior of the eventloop in this kind of environment, and then the internal
pg
timers don't get run or something like this.