broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
993 stars 360 forks source link

Cromwell Failed to summarize metadata #4403

Open chunjie-sam-liu opened 5 years ago

chunjie-sam-liu commented 5 years ago

I'm working on mutation calling based on cromwell, the Failed to summarize metadata comes out for several shards in the scatter, then the following processes are aborted. How to fixed this error?

[2018-11-17 09:04:45,38] [info] BackgroundConfigAsyncJobExecutionActor [3df56d2bPreProcessingForVariantDiscovery_GATK4.MarkDuplicates:5:1]: job id: 56011
[2018-11-17 09:04:45,48] [info] BackgroundConfigAsyncJobExecutionActor [3df56d2bPreProcessingForVariantDiscovery_GATK4.MarkDuplicates:5:1]: Status change from - to WaitingForReturnCodeFile
[2018-11-17 09:37:07,47] [error] Failed to summarize metadata
java.sql.SQLTransientConnectionException: db - Connection is not available, request timed out after 3785ms.
    at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:548)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:145)
    at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:83)
    at slick.jdbc.hikaricp.HikariCPJdbcDataSource.createConnection(HikariCPJdbcDataSource.scala:14)
    at slick.jdbc.JdbcBackend$BaseSession.<init>(JdbcBackend.scala:453)
    at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:46)
    at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:37)
    at slick.basic.BasicBackend$DatabaseDef.acquireSession(BasicBackend.scala:249)
    at slick.basic.BasicBackend$DatabaseDef.acquireSession$(BasicBackend.scala:248)
    at slick.jdbc.JdbcBackend$DatabaseDef.acquireSession(JdbcBackend.scala:37)
    at slick.basic.BasicBackend$DatabaseDef$$anon$2.run(BasicBackend.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2018-11-17 09:37:14,33] [error] Error summarizing metadata
java.sql.SQLTransientConnectionException: db - Connection is not available, request timed out after 3785ms.
    at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:548)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:145)
    at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:83)
    at slick.jdbc.hikaricp.HikariCPJdbcDataSource.createConnection(HikariCPJdbcDataSource.scala:14)
    at slick.jdbc.JdbcBackend$BaseSession.<init>(JdbcBackend.scala:453)
    at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:46)
    at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:37)
    at slick.basic.BasicBackend$DatabaseDef.acquireSession(BasicBackend.scala:249)
    at slick.basic.BasicBackend$DatabaseDef.acquireSession$(BasicBackend.scala:248)
    at slick.jdbc.JdbcBackend$DatabaseDef.acquireSession(JdbcBackend.scala:37)
    at slick.basic.BasicBackend$DatabaseDef$$anon$2.run(BasicBackend.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[2018-11-17 09:37:53,75] [warn] [0 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or call discardBytes() on it. GET /token Empty -> 200 OK Chunked
[2018-11-17 10:11:19,05] [warn] [0 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 second. Make sure to read the response entity body or call discardBytes() on it. GET /token Empty -> 200 OK Chunked
[Guo-1|12:27:21]
hmkim commented 5 years ago

When I use the cromwell in Local mode, I had same issue, too.

I tried at twice.

In second time, the job work well. I guess the idle time is very long at first time.

chunjie-sam-liu commented 5 years ago

@hmkim I continue the break point to run it again, it works now. What part of process takes long idle time in your instance? what makes the long idle time? In fact, the pipeline always consists of multiple processes and works on hundreds of samples. In case of time, what should i config to avoid this errors not run it again?

hmkim commented 5 years ago

There are many workload in server by non-cromwell job. so, I think it will be solving this issue by defining idle time limitation in cromwell.

chunjie-sam-liu commented 5 years ago

The error occurs again, I read this thread, and configure the local mysql database rather in-memory database.

hmkim commented 5 years ago

@chunjie-sam-liu Thanks to linking thread.

geoffjentry commented 5 years ago

@chunjie-sam-liu Did you wind up being able to resolve this?

chunjie-sam-liu commented 5 years ago

@geoffjentry Not really solved. The pipeline could be terminated by the same error, i just extract the samples that are not processed and run it again. It would be better with local MySQL database.

kevin-furant commented 2 years ago

metadata summary failed ! Have you solved the problem yet?