exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
201 stars 55 forks source link

WGS Quartet Thread starvation #319

Closed DGMichael closed 5 years ago

DGMichael commented 5 years ago

Hi All,

I'm processing a hg19 WGS quartet and keep running into a thread starvation issue. Any ideas how to work around this? I'm feeding Exomiser 45G of RAM and have lowered allele frequencies down to 0.2 max AF...still running into this issue.

-> The crash is related to memory utilization, my previous impression was that 30G was enough for a genome quartet? I can address this by throwing more memory at the job - any recommendations for the thread starvation warnings?

The program executes to completion when I run a small VCF with 20k variants.

SLOWDOWN: 2019-02-28 13:04:22.804 [main] INFO org.monarchinitiative.exomiser.core.analysis.AbstractAnalysisRunner - Checking inheritance mode compatibility with [AUTOSOMAL_DOMINANT, AUTOSOMAL_RECESSIVE, X_RECESSIVE, X_DOMINANT, MITOCHONDRIAL] for genes which passed filters 2019-02-28 13:07:44.963 [main] INFO org.monarchinitiative.exomiser.core.analysis.util.InheritanceModeAnalyser - Warning - this will take a while. Testing gene TTC34 with 1474 passed variants for inheritance mode compatibility! 2019-02-28 13:09:09.090 [exomiser-genome-hg19-1802 housekeeper] WARN com.zaxxer.hikari.pool.HikariPool - exomiser-genome-hg19-1802 - Thread starvation or clock leap detected (housekeeper delta=51s719ms454µs228ns). 2019-02-28 13:09:09.090 [exomiser-phenotype-1802 housekeeper] WARN com.zaxxer.hikari.pool.HikariPool - exomiser-phenotype-1802 - Thread starvation or clock leap detected (housekeeper delta=1m8s421ms191µs834ns). 2019-02-28 13:10:58.449 [exomiser-phenotype-1802 housekeeper] WARN com.zaxxer.hikari.pool.HikariPool - exomiser-phenotype-1802 - Thread starvation or clock leap detected (housekeeper delta=1m19s356ms88µs644ns). 2019-02-28 13:10:58.449 [exomiser-genome-hg19-1802 housekeeper] WARN com.zaxxer.hikari.pool.HikariPool - exomiser-genome-hg19-1802 - Thread starvation or clock leap detected (housekeeper delta=1m19s356ms88µs793ns). 2019-02-28 13:13:41.972 [exomiser-phenotype-1802 housekeeper] WARN com.zaxxer.hikari.pool.HikariPool - exomiser-phenotype-1802 - Thread starvation or clock leap detected (housekeeper delta=1m43s520ms497µs868ns). 2019-02-28 13:13:41.972 [exomiser-genome-hg19-1802 housekeeper] WARN com.zaxxer.hikari.pool.HikariPool - exomiser-genome-hg19-1802 - Thread starvation or clock leap detected (housekeeper delta=1m43s520ms498µs299ns). 2019-02-28 13:17:46.845 [exomiser-genome-hg19-1802 housekeeper] WARN com.zaxxer.hikari.pool.HikariPool - exomiser-genome-hg19-1802 - Thread starvation or clock leap detected (housekeeper delta=2m20s501ms933µs797ns). 2019-02-28 13:17:46.845 [exomiser-phenotype-1802 housekeeper] WARN com.zaxxer.hikari.pool.HikariPool - exomiser-phenotype-1802 - Thread starvation or clock leap detected (housekeeper delta=2m20s501ms933µs790ns). 2019-02-28 13:23:40.756 [exomiser-phenotype-1802 housekeeper] WARN com.zaxxer.hikari.pool.HikariPool - exomiser-phenotype-1802 - Thread starvation or clock leap detected (housekeeper delta=3m49s789ms387µs900ns). 2019-02-28 13:23:40.756 [exomiser-genome-hg19-1802 housekeeper] WARN com.zaxxer.hikari.pool.HikariPool - exomiser-genome-hg19-1802 - Thread starvation or clock leap detected (housekeeper delta=3m49s789ms388µs116ns). 2019-02-28 13:45:33.445 [main] INFO org.monarchinitiative.exomiser.core.analysis.AbstractAnalysisRunner - Running Prioritiser: OmimPrioritiser{} 2019-02-28 13:45:35.364 [main] INFO org.monarchinitiative.exomiser.core.analysis.AbstractAnalysisRunner - Running Prioritiser: HiPhivePriority{options=HiPhiveOptions{diseaseId='', candidateGeneSymbol='', benchmarkingEnabled=false, runPpi=true, runHuman=true, runMouse=true, runFish=true}} 2019-02-28 13:45:46.781 [main] INFO org.monarchinitiative.exomiser.core.analysis.AbstractAnalysisRunner - Scoring genes

...And then it crashes...CRASH LOG

Error starting ApplicationContext. To display the auto-configuration report re-run your application with 'debug' enabled. 2019-02-28 14:52:16.000 [main] ERROR org.springframework.boot.SpringApplication - Application startup failed java.lang.OutOfMemoryError: Java heap space at java.base/java.util.Arrays.copyOf(Arrays.java:3688) at java.base/java.util.ArrayList.grow(ArrayList.java:237) at java.base/java.util.ArrayList.grow(ArrayList.java:242) at java.base/java.util.ArrayList.add(ArrayList.java:467) at java.base/java.util.ArrayList.add(ArrayList.java:480) at de.charite.compbio.jannovar.mendel.impl.MendelianCheckerARCompoundHet.collectTroCandidatesWithParents(MendelianCheckerARCompoundHet.java:218) at de.charite.compbio.jannovar.mendel.impl.MendelianCheckerARCompoundHet.collectTrioCandidates(MendelianCheckerARCompoundHet.java:156) at de.charite.compbio.jannovar.mendel.impl.MendelianCheckerARCompoundHet.filterCompatibleRecordsMultiSample(MendelianCheckerARCompoundHet.java:80) at de.charite.compbio.jannovar.mendel.impl.MendelianCheckerARCompoundHet.filterCompatibleRecords(MendelianCheckerARCompoundHet.java:58) at de.charite.compbio.jannovar.mendel.MendelianInheritanceChecker.filterCompatibleRecords(MendelianInheritanceChecker.java:150) at de.charite.compbio.jannovar.mendel.MendelianInheritanceChecker.checkMendelianInheritance(MendelianInheritanceChecker.java:73) at de.charite.compbio.jannovar.mendel.MendelianInheritanceChecker.checkMendelianInheritance(MendelianInheritanceChecker.java:84) at org.monarchinitiative.exomiser.core.analysis.util.InheritanceModeAnnotator.computeCompatibleInheritanceModes(InheritanceModeAnnotator.java:84) at org.monarchinitiative.exomiser.core.analysis.util.InheritanceModeAnalyser.checkInheritanceCompatibilityOfPassedVariants(InheritanceModeAnalyser.java:100) at org.monarchinitiative.exomiser.core.analysis.util.InheritanceModeAnalyser.lambda$analyseInheritanceModes$0(InheritanceModeAnalyser.java:86) at org.monarchinitiative.exomiser.core.analysis.util.InheritanceModeAnalyser$$Lambda$157/1684428673.accept(Unknown Source) at java.base/java.util.ArrayList.forEach(ArrayList.java:1378) at org.monarchinitiative.exomiser.core.analysis.util.InheritanceModeAnalyser.analyseInheritanceModes(InheritanceModeAnalyser.java:66) at org.monarchinitiative.exomiser.core.analysis.AbstractAnalysisRunner.analyseGeneCompatibilityWithInheritanceMode(AbstractAnalysisRunner.java:295) at org.monarchinitiative.exomiser.core.analysis.AbstractAnalysisRunner.runSteps(AbstractAnalysisRunner.java:285) at org.monarchinitiative.exomiser.core.analysis.AbstractAnalysisRunner.run(AbstractAnalysisRunner.java:114) at org.monarchinitiative.exomiser.core.Exomiser.run(Exomiser.java:64) at org.monarchinitiative.exomiser.cli.ExomiserCommandLineRunner.runAnalysisAndWriteResults(ExomiserCommandLineRunner.java:120) at org.monarchinitiative.exomiser.cli.ExomiserCommandLineRunner.runAnalysisFromScript(ExomiserCommandLineRunner.java:116) at org.monarchinitiative.exomiser.cli.ExomiserCommandLineRunner.runAnalyses(ExomiserCommandLineRunner.java:77) at org.monarchinitiative.exomiser.cli.ExomiserCommandLineRunner.run(ExomiserCommandLineRunner.java:71) at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:732) at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:716) at org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:703) at org.springframework.boot.SpringApplication.run(SpringApplication.java:304) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1118) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1107) 2019-02-28 14:52:16.034 [main] INFO org.springframework.context.annotation.AnnotationConfigApplicationContext - Closing org.springframework.context.annotation.AnnotationConfigApplicationContext@67a056f1: startup date [Thu Feb 28 12:04:07 EST 2019]; root of context hierarchy 2019-02-28 14:52:16.042 [main] INFO org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Unregistering JMX-exposed beans on shutdown 2019-02-28 14:52:16.043 [main] INFO org.springframework.jmx.export.annotation.AnnotationMBeanExporter - Unregistering JMX-exposed beans 2019-02-28 14:52:16.071 [main] INFO com.zaxxer.hikari.pool.HikariPool - exomiser-phenotype-1802 - Close initiated... 2019-02-28 14:52:16.167 [main] INFO com.zaxxer.hikari.pool.HikariPool - exomiser-phenotype-1802 - Closed.

julesjacobsen commented 5 years ago

Part of this is caused by the fact you still have 1474 variants left in a gene according to your log:

Testing gene TTC34 with 1474 passed variants for inheritance mode compatibility!

Have you applied any filtering steps? What is your input yaml file?

The thread starvation warning is simply a warning which is triggered by the high CPU caused by the huge number of variants still left in that gene, it's basically cosmetic.

You could try running it again removing the AUTOSOMAL_RECESSIVE_COMP_HET mode of inheritance.

julesjacobsen commented 5 years ago

@DGMichael I'm closing this now. If you still have an issue, please re-open.

lacek commented 3 months ago

I am encountering similar issue with a WGS trio with Exomiser 12.1.0 and data version 2102 (old version for comparing historic results). The config yaml is nearly identical to the example genomiser config (https://github.com/exomiser/Exomiser/blob/12.1.0/exomiser-cli/src/main/resources/examples/test-analysis-genome.yml), except that I am using hg38 for genomeAssembly and using only VCF for outputFormats.

The command used was:

java -Xmx64g -XX:+UseG1GC -jar exomiser-cli.jar --analysis config.yml --spring.config.location=application.genomiser.properties

It worked for a dozen of cases except for one trio, each taking less than 10 minutes. However, for the specific trio it keep giving logs of "Thread starvation or clock leap detected" for over 4 hours without completing. It was eventually killed because it reached the cloud platform's timeout.

I tried removing AUTOSOMAL_RECESSIVE_COMP_HET from the config.yml and it makes no difference:

...
2024-07-11 17:06:12.225  INFO 17 --- [           main] o.m.e.c.analysis.AbstractAnalysisRunner  : Loaded 6400000 variants - 38352 passed variant filters...           2024-07-11 17:06:15.439  INFO 17 --- [           main] o.m.e.c.analysis.AbstractAnalysisRunner  : Loaded 6500000 variants - 38897 passed variant filters...
2024-07-11 17:06:17.058  INFO 17 --- [           main] o.m.e.core.genome.VariantFactoryImpl     : Processed 6287750 variant records into 6547762 single allele variants, 56 are missing annotations, most likely due to non-numeric chromosome designations
2024-07-11 17:06:17.058  INFO 17 --- [           main] o.m.e.core.genome.VariantFactoryImpl     : Variant annotation finished in 4m 13s 491ms (253491 ms)             2024-07-11 17:06:17.058  INFO 17 --- [           main] o.m.e.c.analysis.AbstractAnalysisRunner  : Loaded 6547818 variants - 39231 passed variant filters
2024-07-11 17:06:17.089  INFO 17 --- [           main] o.m.e.c.analysis.AbstractAnalysisRunner  : Checking inheritance mode compatibility with [AUTOSOMAL_DOMINANT, X_RECESSIVE, X_DOMINANT, MITOCHONDRIAL] for genes which passed filters
2024-07-11 17:21:30.186  INFO 17 --- [           main] o.m.e.c.a.util.InheritanceModeAnalyser   : Warning - this will take a while. Testing gene KCNJ18 with 1163 passed variants for inheritance mode compatibility!
2024-07-11 17:27:29.213  WARN 17 --- [102 housekeeper] com.zaxxer.hikari.pool.HikariPool        : exomiser-phenotype-2102 - Thread starvation or clock leap detected (housekeeper delta=1m29s698ms55µs34ns).
2024-07-11 17:27:29.213  WARN 17 --- [102 housekeeper] com.zaxxer.hikari.pool.HikariPool        : exomiser-genome-hg38-2102 - Thread starvation or clock leap detected (housekeeper delta=1m32s625ms283µs859ns).
...

The java process comsunes ~69G RAM (RES) and 100% CPU only (i.e. 1 core only, even the server comes with 48 cores). It momentarily use more cores, but overall CPU usage stayed very low.

I tried another server with ≥ 200GB RAM by running the command as java -Xmx192g -XX:+UseG1GC ... to triple the allowed memory. It grew to use ~198GB memory without the "Thread starvation" warning message, but still made no progress after > 2.5 hours. I had to stop the trial to free up the server for other purpose.

I wonder if there are other parameters that can be tuned to make exomiser completes in a sensible runtime (e.g. 30 minutes in my case) with less than 90GB RAM.

Edit:

Below is the trace log when there is no thread starvation warning. Apparently no actual work is being done except for waiting via futex() and getting SIGSEGV signal.

Trace of process 2269942 - java -Xmx128g -XX:+UseG1GC -jar exomiser-cli.jar --analysis config.yml

strace: Process 2269942 attached
15:00:23.974436 futex(0x7fd6e8492778, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000014>
15:00:23.974578 futex(0x7fd6e8492728, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000008>
15:00:23.974604 futex(0x7fd6e800c778, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0 <0.009017>
15:00:23.983646 futex(0x7fd6e800c728, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000007>
15:00:30.854635 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7fd6efcf4000} ---
15:00:30.854715 rt_sigreturn({mask=[QUIT]}) = 140539460647464 <0.000009>
15:00:30.854808 futex(0x7fd6e849277c, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000009>
15:00:30.854835 futex(0x7fd6e8492728, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000008>
15:00:30.854857 futex(0x7fd6e800c77c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0 <0.207138>
15:00:31.062057 futex(0x7fd6e800c728, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000008>
15:00:31.062112 futex(0x7fd6e800c778, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0 <0.020015>
15:00:31.082187 futex(0x7fd6e800c728, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000008>
15:00:43.159459 futex(0x7fd6e8492778, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000014>
15:00:43.159537 futex(0x7fd6e8492728, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000008>
15:00:43.159587 futex(0x7fd6e800c77c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0 <0.008007>
15:00:43.167618 futex(0x7fd6e800c728, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000007>