Open tatianguiqu opened 3 years ago
Our configuration follows this repository. Now, most of the time it's working well, but this error will occur in a few hours or in a day.
@tatianguiqu : it is trying to establish connection with postgres but can not establish one.
@ravishetye When I wanted to observe the log and database when bugs happened, it didn't make any mistakes for more than four days. I'm still waiting, but there was always an exception in one day before. It's confusing.
The prod 'verta-backend-0' kept running for 17 days without becoming failed. It is so confused. But there were 500 server errors in deploying the service today. After I restarted the prod, the problem was solved. The error log of prod 'verta-backend-0' is following:
{"thread":"pool-4-thread-1","level":"WARN","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil checkDBConnection() got error ","thrown":{"commonElementCount":0,"localizedMessage":"FATAL: sorry, too many clients already","message":"FATAL: sorry, too many clients already","name":"org.postgresql.util.PSQLException","extendedStackTrace":"org.postgresql.util.PSQLException: FATAL: sorry, too many clients already\n\tat org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:520) ~[postgresql-42.2.6.jar!/:42.2.6]\n\tat org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:141) ~[postgresql-42.2.6.jar!/:42.2.6]\n\tat org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:192) ~[postgresql-42.2.6.jar!/:42.2.6]\n\tat org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49) ~[postgresql-42.2.6.jar!/:42.2.6]\n\tat org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:195) ~[postgresql-42.2.6.jar!/:42.2.6]\n\tat org.postgresql.Driver.makeConnection(Driver.java:458) ~[postgresql-42.2.6.jar!/:42.2.6]\n\tat org.postgresql.Driver.connect(Driver.java:260) ~[postgresql-42.2.6.jar!/:42.2.6]\n\tat java.sql.DriverManager.getConnection(DriverManager.java:677) ~[java.sql:?]\n\tat java.sql.DriverManager.getConnection(DriverManager.java:228) ~[java.sql:?]\n\tat ai.verta.modeldb.utils.ModelDBHibernateUtil.checkDBConnection(ModelDBHibernateUtil.java:483) [classes!/:1.0-SNAPSHOT]\n\tat ai.verta.modeldb.utils.ModelDBHibernateUtil.checkDBConnectionInLoop(ModelDBHibernateUtil.java:325) [classes!/:1.0-SNAPSHOT]\n\tat ai.verta.modeldb.utils.ModelDBHibernateUtil.loopBack(ModelDBHibernateUtil.java:294) [classes!/:1.0-SNAPSHOT]\n\tat ai.verta.modeldb.utils.ModelDBHibernateUtil.createOrGetSessionFactory(ModelDBHibernateUtil.java:271) [classes!/:1.0-SNAPSHOT]\n\tat ai.verta.modeldb.utils.ModelDBHibernateUtil.getSessionFactory(ModelDBHibernateUtil.java:277) [classes!/:1.0-SNAPSHOT]\n\tat ai.verta.modeldb.cron_jobs.ParentTimestampUpdateCron.run(ParentTimestampUpdateCron.java:25) [classes!/:1.0-SNAPSHOT]\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]\n\tat java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]\n\tat java.lang.Thread.run(Thread.java:832) [?:?]\n"},"endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","instant":{"epochSecond":1602301066,"nanoOfSecond":947000000},"threadId":40,"threadPriority":5,"hostName":"verta-backend-0","kubernetes.podIP":""} {"thread":"pool-4-thread-1","level":"DEBUG","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil getSessionFactory() retrying for DB connection after 40 millisecond ","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","instant":{"epochSecond":1602301066,"nanoOfSecond":987000000},"threadId":40,"threadPriority":5,"hostName":"verta-backend-0","kubernetes.podIP":""}
@ravishetye
The log says 'sorry, too many clients already'. Is it caused by too many open hibernate sessions?
I face the same case where my deploy version is 1.0.8.1 in my case, the backend restarts in days.
The prod works well at first, but it always breaks down after a few hours. Then it t starts printing logs indefinitely:
{"thread":"pool-3-thread-1","level":"WARN","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil checkDBConnection() got error ","thrown":{"commonElementCount":0 │ {"thread":"pool-4-thread-1","level":"DEBUG","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil getSessionFactory() retrying for DB connection after 2560 milli │ {"thread":"pool-4-thread-1","level":"WARN","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil checkDBConnection() got error ","thrown":{"commonElementCount":0 │ {"thread":"pool-3-thread-1","level":"DEBUG","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil getSessionFactory() retrying for DB connection after 2560 milli │ {"thread":"pool-4-thread-1","level":"DEBUG","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil getSessionFactory() retrying for DB connection after 2560 milli │ {"thread":"pool-3-thread-1","level":"WARN","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil checkDBConnection() got error ","thrown":{"commonElementCount":0 │ {"thread":"pool-4-thread-1","level":"WARN","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil checkDBConnection() got error ","thrown":{"commonElementCount":0 │ {"thread":"pool-4-thread-1","level":"DEBUG","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil getSessionFactory() retrying for DB connection after 2560 milli │ {"thread":"pool-3-thread-1","level":"DEBUG","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil getSessionFactory() retrying for DB connection after 2560 milli
This error seems to be caused by this call path: ai.verta.modeldb.utils.ModelDBHibernateUtil.checkDBConnectionInLoop()-> checkDBConnection()-> getDBConnection()->Class.forName(rDBDriver) Because enableTrace is not enabled, RdbDriver is configured as "org.postgresql.Driver".
It's confusing to make mistakes there after a few hours.