akvo / akvo-lumen

Make sense of your data
https://akvo.org/akvo-lumen
GNU Affero General Public License v3.0
63 stars 18 forks source link

Postgres runs out connections #3167

Open MichaelAkvo opened 1 year ago

MichaelAkvo commented 1 year ago

Context

When opening https://ci-fiji.akvolumen.org/ after logging in, the call to /api/env was failing with a 500 error leaving the frontend blank.

Problem or idea

stack

11:14:12.660 [XNIO-1 task-14] ERROR akvo.lumen.auth.api-authorization - #error {
 :cause FATAL: remaining connection slots are reserved for non-replication superuser connections
 :via
 [{:type com.zaxxer.hikari.pool.HikariPool$PoolInitializationException
   :message Failed to initialize pool: FATAL: remaining connection slots are reserved for non-replication superuser connections
   :at [com.zaxxer.hikari.pool.HikariPool throwPoolInitializationException HikariPool.java 576]}
  {:type org.postgresql.util.PSQLException
   :message FATAL: remaining connection slots are reserved for non-replication superuser connections
   :at [org.postgresql.core.v3.QueryExecutorImpl receiveErrorResponse QueryExecutorImpl.java 2440]}]
 :trace
 [[org.postgresql.core.v3.QueryExecutorImpl receiveErrorResponse QueryExecutorImpl.java 2440]
  [org.postgresql.core.v3.QueryExecutorImpl readStartupMessages QueryExecutorImpl.java 2559]
  [org.postgresql.core.v3.QueryExecutorImpl <init> QueryExecutorImpl.java 133]
  [org.postgresql.core.v3.ConnectionFactoryImpl openConnectionImpl ConnectionFactoryImpl.java 250]
  [org.postgresql.core.ConnectionFactory openConnection ConnectionFactory.java 49]
  [org.postgresql.jdbc.PgConnection <init> PgConnection.java 195]
  [org.postgresql.Driver makeConnection Driver.java 454]
  [org.postgresql.Driver connect Driver.java 256]
  [com.zaxxer.hikari.util.DriverDataSource getConnection DriverDataSource.java 119]
  [com.zaxxer.hikari.pool.PoolBase newConnection PoolBase.java 369]
  [com.zaxxer.hikari.pool.PoolBase newPoolEntry PoolBase.java 198]
  [com.zaxxer.hikari.pool.HikariPool createPoolEntry HikariPool.java 467]
  [com.zaxxer.hikari.pool.HikariPool checkFailFast HikariPool.java 541]
  [com.zaxxer.hikari.pool.HikariPool <init> HikariPool.java 115]
  [com.zaxxer.hikari.HikariDataSource <init> HikariDataSource.java 81]
  [akvo.lumen.component.tenant_manager$pool invokeStatic tenant_manager.clj 61]
  [akvo.lumen.component.tenant_manager$pool invoke tenant_manager.clj 49]
  [akvo.lumen.component.tenant_manager$load_tenant$fn__19185 invoke tenant_manager.clj 77]
  [clojure.lang.Delay deref Delay.java 42]
  [clojure.core$deref invokeStatic core.clj 2320]
  [clojure.core$deref invoke core.clj 2306]
  [akvo.lumen.component.tenant_manager.TenantManager connection tenant_manager.clj 89]
  [akvo.lumen.endpoint.env$handler$fn__27115 invoke env.clj 25]
  [akvo.lumen.component.tenant_manager$wrap_label_tenant$fn__19176 invoke tenant_manager.clj 42]
  [iapetos.collector.ring$run_instrumented invokeStatic ring.clj 127]
  [iapetos.collector.ring$run_instrumented invoke ring.clj 123]
  [iapetos.collector.ring$wrap_instrumentation$fn__5520 invoke ring.clj 163]
  [iapetos.collector.ring$wrap_metrics_expose$fn__5529 invoke ring.clj 184]
  [akvo.lumen.lib.auth$wrap_auth_datasets$fn__26715$fn__26717 invoke auth.clj 193]
  [akvo.lumen.auth.api_authorization$api_authorization invokeStatic api_authorization.clj 30]
  [akvo.lumen.auth.api_authorization$api_authorization invoke api_authorization.clj 19]
  [akvo.lumen.auth$authorize invokeStatic auth.clj 30]
  [akvo.lumen.auth$authorize invoke auth.clj 28]
  [akvo.lumen.util$as_middleware$fn__11350$fn__11351 invoke util.clj 99]
  [akvo.lumen.auth.jwt_authentication$jwt_authentication invokeStatic jwt_authentication.clj 22]
  [akvo.lumen.auth.jwt_authentication$jwt_authentication invoke jwt_authentication.clj 15]
  [akvo.lumen.util$as_middleware$fn__11350$fn__11351 invoke util.clj 99]
  [akvo.lumen.component.handler$fn__17214$fn__17216$fn__17217 invoke handler.clj 114]
  [akvo.lumen.component.handler$fn__17232$fn__17233$fn__17234 invoke handler.clj 125]
  [ring.middleware.json$wrap_json_response$fn__16849 invoke json.clj 87]
  [ring.middleware.json$wrap_json_body$fn__16824 invoke json.clj 46]
  [ring.middleware.keyword_params$wrap_keyword_params$fn__16351 invoke keyword_params.clj 53]
  [ring.middleware.params$wrap_params$fn__16639 invoke params.clj 67]
  [ring.middleware.absolute_redirects$wrap_absolute_redirects$fn__16754 invoke absolute_redirects.clj 47]
  [ring.middleware.content_type$wrap_content_type$fn__16718 invoke content_type.clj 34]
  [ring.middleware.default_charset$wrap_default_charset$fn__16734 invoke default_charset.clj 31]
  [ring.middleware.not_modified$wrap_not_modified$fn__16707 invoke not_modified.clj 61]
  [raven_clj.ring$wrap_sentry$fn__4618 invoke ring.clj 22]
  [akvo.lumen.component.handler$fn__17192$fn__17194$fn__17195 invoke handler.clj 98]
  [akvo.lumen.component.tenant_manager$wrap_label_tenant$fn__19176 invoke tenant_manager.clj 42]
  [reitit.ring$ring_handler$fn__15884 invoke ring.cljc 286]
  [clojure.lang.AFn applyToHelper AFn.java 154]
  [clojure.lang.AFn applyTo AFn.java 144]
  [clojure.lang.AFunction$1 doInvoke AFunction.java 31]
  [clojure.lang.RestFn invoke RestFn.java 408]
  [immutant.web.internal.undertow$create_http_handler$reify__19026 handleRequest undertow.clj 239]
  [org.projectodd.wunderboss.web.undertow.async.websocket.UndertowWebsocket$2 handleRequest UndertowWebsocket.java 109]
  [io.undertow.server.session.SessionAttachmentHandler handleRequest SessionAttachmentHandler.java 68]
  [io.undertow.server.Connectors executeRootHandler Connectors.java 211]
  [io.undertow.server.HttpServerExchange$1 run HttpServerExchange.java 809]
  [java.util.concurrent.ThreadPoolExecutor runWorker nil -1]
  [java.util.concurrent.ThreadPoolExecutor$Worker run nil -1]
  [java.lang.Thread run nil -1]]}
11:14:12.662 [XNIO-1 task-14] ERROR akvo.lumen.component.handler - 500 App Error

current open connections

postgres=> select max_conn,used,res_for_super,max_conn-used-res_for_super
res_for_normal
from
  (select count(*) used from pg_stat_activity) t1,
  (select setting::int res_for_super from pg_settings where
name='superuser_reserved_connections') t2,
  (select setting::int max_conn from pg_settings where name='max_connections') t3
;
 max_conn | used | res_for_super | res_for_normal
----------+------+---------------+----------------
      100 |   42 |             3 |             55
(1 row)

Solution or next step

Probably reduce the idle time for connections in a pool.

Given the number of available connections (55 as seen above), it shouldn't have failed, but

remaining connection slots are reserved for non-replication superuser connections

does indicate something else. Not sure how to ensure this doesn't happen again