In case portal-api cannot connect to the Postgres database it needs within 10 seconds (the predefined timeout for connecting to Postgres), the portal-api process crashes and needs to be restarted. Stack traces with enabled LOG_LEVEL=debug can look like this (this is a connect error when checking the webhook queue during a "cron" job):
error: [+3743ms] portal-api:webhooks *** COULD NOT GET WEBHOOKS
error: [+ 0ms] portal-api:dao:pg:webhooks ERROR dispatching webhook events
error: [+ 0ms] portal-api:dao:pg:webhooks {}
Error: timeout exceeded when trying to connect
at Timeout.setTimeout [as _onTimeout] (/usr/src/app/node_modules/pg-pool/index.js:165:27)
at ontimeout (timers.js:424:11)
at tryOnTimeout (timers.js:288:5)
at listOnTimeout (timers.js:251:5)
at Timer.processTimers (timers.js:211:10)
/usr/src/app/node_modules/async/dist/async.js:966
if (fn === null) throw new Error("Callback was already called.");
^
Error: Callback was already called.
at /usr/src/app/node_modules/async/dist/async.js:966:32
at /usr/src/app/node_modules/async/dist/async.js:3885:13
at poolOrClient.query (/usr/src/app/dao/postgres/pg-utils.js:343:24)
at Object.connect [as callback] (/usr/src/app/node_modules/pg-pool/index.js:253:16)
at /usr/src/app/node_modules/pg-pool/index.js:170:18
at client.connect (/usr/src/app/node_modules/pg-pool/index.js:219:9)
at Connection.con.once (/usr/src/app/node_modules/pg/lib/client.js:182:11)
at Object.onceWrapper (events.js:273:13)
at Connection.emit (events.js:182:13)
at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:76:10)
A retry mechanism should be implemented in these cases; this was seen with Azure Postgres as a service, but almost never with other types of Postgres implementations (such as running a Postgres as a pod). Nonetheless, this must be addressed.
In case portal-api cannot connect to the Postgres database it needs within 10 seconds (the predefined timeout for connecting to Postgres), the portal-api process crashes and needs to be restarted. Stack traces with enabled
LOG_LEVEL=debug
can look like this (this is a connect error when checking the webhook queue during a "cron" job):A retry mechanism should be implemented in these cases; this was seen with Azure Postgres as a service, but almost never with other types of Postgres implementations (such as running a Postgres as a pod). Nonetheless, this must be addressed.
Observed with 1.0.0.beta3.