PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
258 stars 71 forks source link

gridss 2.13.2 samtools sort error #616

Closed jdekanter closed 1 year ago

jdekanter commented 1 year ago

Dear all,

Thank you for this great tool! Recently, when running gridss v2.13.2 as a part of hmftools (https://github.com/hartwigmedical/hmftools), gridss stops in the very beginning at the CollectGridssMetricsAndExtractSVReads step due to an error in samtools sort: "samtools sort: can't open "/dev/stdin": Exec format error" see the full output below. In a previous issue you suggested that it can run out of memory, but the error occurs 4 seconds after starting this step + I now gave it 180GB, so this seems unlikely to me. In addition, you said it could be the samtools version, but I've now tried this with samtools version 1.15.1, 1.16 and 1.17 and they all give the same error.

In a previous pipeline, which ran gridss v2.9.4, we did not have this issue.

Do you have any additional ideas what the problem could be? Thank you for your input. If you need more information, please let me know.

[Thu Feb 23 16:42:46 CET 2023] CollectGridssMetricsAndExtractSVReads MIN_CLIP_LENGTH=5 READ_PAIR_CONCORDANT_PERCENT=0.995 INSERT_SIZE_METRICS=[path] UNMAPPED_READS=false INCLUDE_DUPLICATES=true SV_OUTPUT=/dev/stdout GRIDSS_PROGRAM=[CollectCigarMetrics, CollectMapqMetrics, CollectTagMetrics, CollectIdsvMetrics, ReportThresholdCoverage] THRESHOLD_COVERAGE=50000 INPUT=[to/to/bam] ASSUME_SORTED=true OUTPUT=[/path/to/working/bam] FILE_EXTENSION=null PROGRAM=[CollectInsertSizeMetrics] TMP_DIR=[/path/to/working/dir] COMPRESSION_LEVEL=0 REFERENCE_SEQUENCE=[/path/to/Homo_sapiens_assembly38.fasta] MIN_INDEL_SIZE=1 CLIPPED=true INDELS=true SPLIT=true SINGLE_MAPPED_PAIRED=true DISCORDANT_READ_PAIRS=true STOP_AFTER=0 METRIC_ACCUMULATION_LEVEL=[ALL_READS] INCLUDE_UNPAIRED=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Thu Feb 23 16:42:46 CET 2023] Executing as [user]@n0068.compute.hpc on Linux 3.10.0-1160.81.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 18.0.2+9-61; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.13.2-gridss INFO 2023-02-23 16:42:47 SAMFileWriterFactory Unknown file extension, assuming BAM format when writing file: file:///dev/stdout [E::hts_hopen] Failed to open file /dev/stdin [E::hts_open_format] Failed to open file "/dev/stdin" : Exec format error samtools sort: can't open "/dev/stdin": Exec format error [Thu Feb 23 16:42:51 CET 2023] gridss.CollectGridssMetricsAndExtractSVReads done. Elapsed time: 0.09 minutes. Runtime.totalMemory()=2017460224 Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Exception when running gridss.cmdline.ByReadNameSinglePassSamProgram$WrappedSinglePassSamProgram at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:253) at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:134) at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:126) at picard.analysis.CollectMultipleMetrics.doWork(CollectMultipleMetrics.java:598) at gridss.analysis.CollectGridssMetrics.doWork(CollectGridssMetrics.java:78) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305) at picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:196) at gridss.CollectGridssMetricsAndExtractSVReads.main(CollectGridssMetricsAndExtractSVReads.java:56) Caused by: java.lang.RuntimeException: Exception when running gridss.cmdline.ByReadNameSinglePassSamProgram$WrappedSinglePassSamProgram at picard.analysis.SinglePassSamProgram.raiseAsyncException(SinglePassSamProgram.java:282) at picard.analysis.SinglePassSamProgram.asyncAcceptRead(SinglePassSamProgram.java:273) at picard.analysis.SinglePassSamProgram.asyncAcceptReads(SinglePassSamProgram.java:263) at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:216) ... 7 more Caused by: htsjdk.samtools.util.RuntimeIOException: Write error; BinaryCodec in writemode; streamed file (filename not available) at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:222) at htsjdk.samtools.util.BlockCompressedOutputStream.writeGzipBlock(BlockCompressedOutputStream.java:451) at htsjdk.samtools.util.BlockCompressedOutputStream.deflateBlock(BlockCompressedOutputStream.java:415) at htsjdk.samtools.util.BlockCompressedOutputStream.write(BlockCompressedOutputStream.java:305) at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:220) at htsjdk.samtools.util.BinaryCodec.writeByteBuffer(BinaryCodec.java:188) at htsjdk.samtools.util.BinaryCodec.writeInt(BinaryCodec.java:234) at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:162) at htsjdk.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:144) at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:185) at htsjdk.samtools.AsyncSAMFileWriter.synchronouslyWrite(AsyncSAMFileWriter.java:36) at htsjdk.samtools.AsyncSAMFileWriter.synchronouslyWrite(AsyncSAMFileWriter.java:16) at htsjdk.samtools.util.AbstractAsyncWriter$WriterRunnable.run(AbstractAsyncWriter.java:123) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: java.io.IOException: Broken pipe at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method) at java.base/sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:62) at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:137) at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:102) at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:72) at java.base/sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:288) at java.base/sun.nio.ch.ChannelOutputStream.writeFullyImpl(ChannelOutputStream.java:60) at java.base/sun.nio.ch.ChannelOutputStream.writeFully(ChannelOutputStream.java:82) at java.base/sun.nio.ch.ChannelOutputStream.write(ChannelOutputStream.java:122) at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81) at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127) at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:220) ... 13 more

d-cameron commented 1 year ago

Caused by: java.io.IOException: Broken pipe

Looks like the CollectGridssMetricsAndExtractSVReads step got killed.

I now gave it 180GB

The memory for that particular process uses the --otherjvmheap command line parameter, not --jvmheap. If you're giving your job 180g, then you should set --jvmheap to 175g and --otherjvmheap to ~140g. Both need to be less than 180g as the memory sizes applying to these parameters is just the JVM heap size and your job needs memory for things like the JVM stack, (and for --otherjvmheap) running bwa and samtools at the same time.

    --jvmheap: size of JVM heap for the high-memory component of assembly and
        variant calling. (Default: 30g)
    --otherjvmheap: size of JVM heap for everything else. Useful to prevent
        java out of memory errors when using large (>4Gb) reference genomes.
        Note that some parts of assembly and variant calling use this heap
        size. (Default: 4g)
jdekanter commented 1 year ago

Hi, thank you for the very quick answer. I have made the adjustments that you suggested, but the job is killed within 7 seconds after the start of the CollectGridssMetricsAndExtractSVReads step. Is it really possible that the job runs out of that much memory (and so quickly)? Is there no other possibility than memory that might be the limiting factor? I'm just using the human reference (~3G) and a bam of 130G and 40G. Thanks for taking the time!