faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
80 stars 49 forks source link

Picard Dependency Issue - CleanSam truncates .bam files #228

Closed BirdmanRidesAgain closed 3 years ago

BirdmanRidesAgain commented 3 years ago

I am running a pipeline which works under Phyluce 1.6.3 to extract full mitogenome sequences. Part of that pipeline involves running CleanSam followed by AddOrReplaceReadGroups to clean a .bam file for further processing. When I run CleanSam, a reasonably-sized outfile is produced, albeit with a "fatal error detected by the Java Runtime Environment". Upon running AddOrReplaceReadGroups, the program cancels early and an empty file is produced as output due to the .bam file being truncated. This does not occur when I attempt to run AddOrReplaceReadGroups on the raw .bam file.

I've included both the input/output from the two commands and the log file generated for the CleanSam run. Any help you could give in figuring out why this is happening would be appreciated.

CleanSam Command

_picard CleanSam \ -I pluvialis-dominica_JJW362-mtDNA-contig.bam \ -O pluvialis-dominica_JJW362-mtDNA-contig_CL.bam

17:37:19.263 INFO NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/melospiza/miniconda3/envs/phyluce-1.7.1/share/picard-2.25.5-0/picard.jar!/com/intel/gkl/native/libgkl_compression.dylib [Wed May 19 17:37:19 AKDT 2021] CleanSam --INPUT pluvialis-dominica_JJW362-mtDNA-contig.bam --OUTPUT pluvialis-dominica_JJW362-mtDNA-contig_CL.bam --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false [Wed May 19 17:37:19 AKDT 2021] Executing as melospiza@Colliers-iMac.local on Mac OS X 10.16 x86_64; OpenJDK 64-Bit Server VM 11.0.8+10-LTS; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.25.5 #

A fatal error has been detected by the Java Runtime Environment:

#

SIGSEGV (0xb) at pc=0x000000010d5c9ea7, pid=26277, tid=8195

#

JRE version: OpenJDK Runtime Environment Zulu11.41+23-CA (11.0.8+10) (build 11.0.8+10-LTS)

Java VM: OpenJDK 64-Bit Server VM Zulu11.41+23-CA (11.0.8+10-LTS, mixed mode, tiered, compressed oops, g1 gc, bsd-amd64)

Problematic frame:

C [libgkl_compression1027032708436590096.dylib+0x6ea7] deflate_medium+0x867

#

No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

#

An error report file with more information is saved as:

/Users/melospiza/Documents/pluvialis-df-mtDNA-analysis/pluvialis-df-mtDNA-SNP-calling-pipeline/pluvialis-df-mtDNA-bwa-alignments/hs_err_pid26277.log

#

If you would like to submit a bug report, please visit:

http://www.azulsystems.com/support/

The crash happened outside the Java Virtual Machine in native code.

See problematic frame for where to report the bug.

# /Users/melospiza/miniconda3/envs/phyluce-1.7.1/bin/picard: line 66: 26277 Abort trap: 6 /Users/melospiza/miniconda3/envs/phyluce-1.7.1/bin/java -Xms512m -Xmx2g -jar /Users/melospiza/miniconda3/envs/phyluce-1.7.1/share/picard-2.25.5-0/picard.jar CleanSam "-I" "pluvialis-dominica_JJW362-mtDNA-contig.bam" "-O" "pluvialis-dominica_JJW362-mtDNA-contig_CL.bam"

(phyluce-1.7.1) melospiza@Colliers-iMac pluvialis-df-mtDNA-bwa-alignments % picard AddOrReplaceReadGroups \ I=pluvialis-dominica_JJW362-mtDNA-contig_CL.bam \ O=pluvialis-dominica-JJW362-mtDNA-contig_CL_RG.bam \ SORT_ORDER=coordinate \ RGPL=illumina \ RGPU=TextXX \ RGLB=Lib1 \ RGID=pluvialis-dominica-JJW362-mtDNA-contig \ RGSM=pluvialis-dominica-JJW362-mtDNA-contig \ VALIDATION_STRINGENCY=LENIENT INFO 2021-05-19 17:21:04 AddOrReplaceReadGroups

** NOTE: Picard's command line syntax is changing.


** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


** The command line looks like this in the new syntax:


** AddOrReplaceReadGroups -I pluvialis-dominica_JJW362-mtDNA-contig_CL.bam -O pluvialis-dominica-JJW362-mtDNA-contig_CL_RG.bam -SORT_ORDER coordinate -RGPL illumina -RGPU TextXX -RGLB Lib1 -RGID pluvialis-dominica-JJW362-mtDNA-contig -RGSM pluvialis-dominica-JJW362-mtDNA-contig -VALIDATION_STRINGENCY LENIENT


17:21:10.136 INFO NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/melospiza/miniconda3/envs/phyluce-1.7.1/share/picard-2.25.5-0/picard.jar!/com/intel/gkl/native/libgkl_compression.dylib [Wed May 19 17:21:10 AKDT 2021] AddOrReplaceReadGroups INPUT=pluvialis-dominica_JJW362-mtDNA-contig_CL.bam OUTPUT=pluvialis-dominica-JJW362-mtDNA-contig_CL_RG.bam SORT_ORDER=coordinate RGID=pluvialis-dominica-JJW362-mtDNA-contig RGLB=Lib1 RGPL=illumina RGPU=TextXX RGSM=pluvialis-dominica-JJW362-mtDNA-contig VALIDATION_STRINGENCY=LENIENT VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Wed May 19 17:21:10 AKDT 2021] Executing as melospiza@Colliers-iMac.local on Mac OS X 10.16 x86_64; OpenJDK 64-Bit Server VM 11.0.8+10-LTS; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.25.5 INFO 2021-05-19 17:21:10 AddOrReplaceReadGroups Created read-group ID=pluvialis-dominica-JJW362-mtDNA-contig PL=illumina LB=Lib1 SM=pluvialis-dominica-JJW362-mtDNA-contig

[Wed May 19 17:21:10 AKDT 2021] picard.sam.AddOrReplaceReadGroups done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=536870912 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.FileTruncatedException: Premature end of file: /Users/melospiza/Documents/pluvialis-df-mtDNA-analysis/pluvialis-df-mtDNA-SNP-calling-pipeline/pluvialis-df-mtDNA-bwa-alignments/pluvialis-dominica_JJW362-mtDNA-contigCL.bam at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:530) at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468) at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458) at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196) at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:331) at java.base/java.io.DataInputStream.read(DataInputStream.java:149) at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:421) at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:394) at htsjdk.samtools.util.BinaryCodec.readByteBuffer(BinaryCodec.java:507) at htsjdk.samtools.util.BinaryCodec.readUShort(BinaryCodec.java:587) at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:274) at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:866) at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:840) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:834) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:802) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:591) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:570) at picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:182) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113) hs_err_pid26277.log

AddOrReplaceReadGroups Command

_(phyluce-1.7.1) melospiza@Colliers-iMac pluvialis-df-mtDNA-bwa-alignments % picard AddOrReplaceReadGroups \

I=pluvialis-dominica_JJW362-mtDNA-contig_CL.bam \ O=pluvialis-dominica_JJW362-mtDNA-contig_CL_RG.bam \ SORT_ORDER=coordinate \ RGPL=illumina \ RGPU=TestXX \ RGLB=Lib1 \ RGID=pluvialis-dominica_JJW362-mtDNA-contig \ RGSM=pluvialis-dominica_JJW362-mtDNA-contig \ VALIDATION_STRINGENCY=LENIENT INFO 2021-05-19 17:51:55 AddOrReplaceReadGroups

** NOTE: Picard's command line syntax is changing.


** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)


** The command line looks like this in the new syntax:


** AddOrReplaceReadGroups -I pluvialis-dominica_JJW362-mtDNA-contig_CL.bam -O pluvialis-dominica_JJW362-mtDNA-contig_CL_RG.bam -SORT_ORDER coordinate -RGPL illumina -RGPU TestXX -RGLB Lib1 -RGID pluvialis-dominica_JJW362-mtDNA-contig -RGSM pluvialis-dominica_JJW362-mtDNA-contig -VALIDATION_STRINGENCY LENIENT


17:52:01.180 INFO NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/melospiza/miniconda3/envs/phyluce-1.7.1/share/picard-2.25.5-0/picard.jar!/com/intel/gkl/native/libgkl_compression.dylib [Wed May 19 17:52:01 AKDT 2021] AddOrReplaceReadGroups INPUT=pluvialis-dominica_JJW362-mtDNA-contig_CL.bam OUTPUT=pluvialis-dominica_JJW362-mtDNA-contig_CL_RG.bam SORT_ORDER=coordinate RGID=pluvialis-dominica_JJW362-mtDNA-contig RGLB=Lib1 RGPL=illumina RGPU=TestXX RGSM=pluvialis-dominica_JJW362-mtDNA-contig VALIDATION_STRINGENCY=LENIENT VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Wed May 19 17:52:01 AKDT 2021] Executing as melospiza@Colliers-iMac.local on Mac OS X 10.16 x86_64; OpenJDK 64-Bit Server VM 11.0.8+10-LTS; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.25.5 INFO 2021-05-19 17:52:01 AddOrReplaceReadGroups Created read-group ID=pluvialis-dominica_JJW362-mtDNA-contig PL=illumina LB=Lib1 SM=pluvialis-dominica_JJW362-mtDNA-contig

[Wed May 19 17:52:01 AKDT 2021] picard.sam.AddOrReplaceReadGroups done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=536870912 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.FileTruncatedException: Premature end of file: /Users/melospiza/Documents/pluvialis-df-mtDNA-analysis/pluvialis-df-mtDNA-SNP-calling-pipeline/pluvialis-df-mtDNA-bwa-alignments/pluvialis-dominica_JJW362-mtDNA-contigCL.bam at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:530) at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468) at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458) at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196) at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:331) at java.base/java.io.DataInputStream.read(DataInputStream.java:149) at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:421) at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:394) at htsjdk.samtools.util.BinaryCodec.readByteBuffer(BinaryCodec.java:507) at htsjdk.samtools.util.BinaryCodec.readUShort(BinaryCodec.java:587) at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:274) at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:866) at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:840) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:834) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:802) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:591) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:570) at picard.sam.AddOrReplaceReadGroups.doWork(AddOrReplaceReadGroups.java:182) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113) (phyluce-1.7.1) melospiza@Colliers-iMac pluvialis-df-mtDNA-bwa-alignments %

brantfaircloth commented 3 years ago

This looks like an issue in Picard rather than something having to do with phyluce.

That said looking at the errors that are output, AddOrReplaceReadGroups is failing because the BAM being input to it has a "Premature end of file". This is likely due to the fact that CleanSam aborts and fails to produce a valid BAM that you are then trying to clean. When you run CleanSam, it looks like the error is related to compression/decompression of the input BAM file. Tracking that back (e.g. see this thread) suggests that you might try to run CleanSam by adding:

USE_JDK_DEFLATER=true USE_JDK_INFLATER=true

to the parameters you are passing to CleanSam (see thread for more info). Adding these parameters basically avoids using the native, intel compression/decompression routines which seem to be a cause of the problems in some way.

BirdmanRidesAgain commented 3 years ago

Okay, thanks for the input. Sorry about posting this to the wrong place; I wasn't sure whether I should treat this as a Phyluce or a Picard issue.

BirdmanRidesAgain commented 3 years ago

That fix worked!

brantfaircloth commented 3 years ago

No worries - I just wasn't sure if I knew what I was talking about. Good to see it works (and that I guessed correctly). I'm gonna close for now, but feel free to post more if you hit another problem.