broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
997 stars 361 forks source link

Cromwell logs DB exceptions and then stays idle #4360

Open Horneth opened 6 years ago

Horneth commented 6 years ago

Logs captured from alpha environment:

November 2nd 2018, 20:16:15.000 | 2018-11-03 00:16:15 [cromwell-system-akka.actor.default-dispatcher-57321] ERROR c.e.w.w.WorkflowStoreSubmitActor - Workflow com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction submit failed.
-- | --

  | November 2nd 2018, 20:16:15.000 | 2018-11-03 00:16:15 [cromwell-system-akka.actor.default-dispatcher-57321] ERROR c.e.w.w.WorkflowStoreSubmitActor - Workflow com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction submit failed.

  | November 2nd 2018, 10:16:21.000 | 2018-11-02 14:16:21 [cromwell-system-akka.actor.default-dispatcher-42970] ERROR c.e.w.w.WorkflowStoreEngineActor - Error trying to fetch new workflows
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 1 milliseconds ago.  The last packet sent successfully to the server was 1 milliseconds ago.
    at sun.reflect.GeneratedConstructorAccessor75.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
    at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:990)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3562)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3462)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3905)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2530)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2683)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2491)
    at com.mysql.jdbc.ConnectionImpl.setAutoCommit(ConnectionImpl.java:4807)
    at com.zaxxer.hikari.pool.ProxyConnection.setAutoCommit(ProxyConnection.java:379)
    at com.zaxxer.hikari.pool.HikariProxyConnection.setAutoCommit(HikariProxyConnection.java)
    at slick.jdbc.JdbcBackend$BaseSession.startInTransaction(JdbcBackend.scala:470)
    at slick.jdbc.JdbcActionComponent$StartTransaction$.run(JdbcActionComponent.scala:39)
    at slick.jdbc.JdbcActionComponent$StartTransaction$.run(JdbcActionComponent.scala:36)
    at slick.basic.BasicBackend$DatabaseDef$$anon$2.liftedTree1$1(BasicBackend.scala:275)
    at slick.basic.BasicBackend$DatabaseDef$$anon$2.run(BasicBackend.scala:275)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(SocketInputStream.java:210)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:101)
    at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:144)
    at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:174)
    at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3011)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3472)
    ... 16 common frames omitted

  | November 2nd 2018, 10:16:21.000 | 2018-11-02 14:16:21 [cromwell-system-akka.actor.default-dispatcher-42973] ERROR c.s.m.i.MetadataSummaryRefreshActor - Failed to summarize metadata
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 0 milliseconds ago.  The last packet sent successfully to the server was 0 milliseconds ago.
    at sun.reflect.GeneratedConstructorAccessor75.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
    at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:990)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3562)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3462)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3905)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2530)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2683)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2491)
    at com.mysql.jdbc.ConnectionImpl.setAutoCommit(ConnectionImpl.java:4807)
    at com.zaxxer.hikari.pool.ProxyConnection.setAutoCommit(ProxyConnection.java:379)
    at com.zaxxer.hikari.pool.HikariProxyConnection.setAutoCommit(HikariProxyConnection.java)
    at slick.jdbc.JdbcBackend$BaseSession.startInTransaction(JdbcBackend.scala:470)
    at slick.jdbc.JdbcActionComponent$StartTransaction$.run(JdbcActionComponent.scala:39)
    at slick.jdbc.JdbcActionComponent$StartTransaction$.run(JdbcActionComponent.scala:36)
    at slick.basic.BasicBackend$DatabaseDef$$anon$2.liftedTree1$1(BasicBackend.scala:275)
    at slick.basic.BasicBackend$DatabaseDef$$anon$2.run(BasicBackend.scala:275)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
    at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3014)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3472)
    ... 16 common frames omitted

  | November 2nd 2018, 10:16:21.000 | 2018-11-02 14:16:21 [cromwell-system-akka.actor.default-dispatcher-42973] ERROR c.s.m.impl.MetadataServiceActor - Error summarizing metadata
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 0 milliseconds ago.  The last packet sent successfully to the server was 0 milliseconds ago.
    at sun.reflect.GeneratedConstructorAccessor75.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
    at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:990)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3562)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3462)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3905)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2530)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2683)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2491)
    at com.mysql.jdbc.ConnectionImpl.setAutoCommit(ConnectionImpl.java:4807)
    at com.zaxxer.hikari.pool.ProxyConnection.setAutoCommit(ProxyConnection.java:379)
    at com.zaxxer.hikari.pool.HikariProxyConnection.setAutoCommit(HikariProxyConnection.java)
    at slick.jdbc.JdbcBackend$BaseSession.startInTransaction(JdbcBackend.scala:470)
    at slick.jdbc.JdbcActionComponent$StartTransaction$.run(JdbcActionComponent.scala:39)
    at slick.jdbc.JdbcActionComponent$StartTransaction$.run(JdbcActionComponent.scala:36)
    at slick.basic.BasicBackend$DatabaseDef$$anon$2.liftedTree1$1(BasicBackend.scala:275)
    at slick.basic.BasicBackend$DatabaseDef$$anon$2.run(BasicBackend.scala:275)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
    at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3014)
    at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3472)
    ... 16 common frames omitted

The Lock wait timeout exceeded; try restarting transaction submit failed. one keeps happening over and over. Workflows that have been submitted do not start (the only log entry is the "workflow submitted" one). The workflow store entry table is empty, which makes sense in terms of nothing happening.

kshakir commented 6 years ago

I don't think the errors above correlate with DB backup times, when the database would be unavailable, but here are the times. (At least I think I have the correct terraformed sfmt6… db)

terraform-sfmt6-backups