Hoohm / dropSeqPipe

A SingleCell RNASeq pre-processing snakemake workflow
Creative Commons Attribution Share Alike 4.0 International
147 stars 47 forks source link

Premature end of file error #8

Closed abmmki closed 7 years ago

abmmki commented 7 years ago

Hi,

I got following error in pre-process step. (Using latest version 0.24) Specially at MergeBamAlignment step:

[Tue Oct 24 11:01:49 CDT 2017] MergeBamAlignment UNMAPPED_BAM=Secondary_tagged_unmapped.bam ALIGNED_BAM=[Secondary_Aligned_sorted.sam] OUTPUT=/dev/stdout PAIRED_RUN=false INCLUDE_SECONDARY_ALIGNMENTS=false COMPRESSION_LEVEL=0 REFERENCE_SEQUENCE=/home/grcm38/mm38.fa CLIP_ADAPTERS=true IS_BISULFITE_SEQUENCE=false ALIGNED_READS_ONLY=false MAX_INSERTIONS_OR_DELETIONS=1 ATTRIBUTES_TO_REVERSE=[OQ, U2] ATTRIBUTES_TO_REVERSE_COMPLEMENT=[E2, SQ] READ1_TRIM=0 READ2_TRIM=0 ALIGNER_PROPER_PAIR_FLAGS=false SORT_ORDER=coordinate PRIMARY_ALIGNMENT_STRATEGY=BestMapq CLIP_OVERLAPPING_READS=true ADD_MATE_CIGAR=true UNMAP_CONTAMINANT_READS=false MIN_UNCLIPPED_BASES=32 MATCHING_DICTIONARY_TAGS=[M5, LN] UNMAPPED_READ_STRATEGY=DO_NOT_CHANGE ADD_PG_TAG_TO_READS=true VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false

Linux 4.8.13-100.fc23.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_111-b16; Deflater: Intel; Inflater: Intel; Picard version: 2.12.1-SNAPSHOT

Following is the error

INFO 2017-10-24 11:34:56 TagReadWithGeneExon Processed 62,000,000 records. Elapsed time: 00:33:07s. Time for last 1,000,000: 17s. Last read position: 11:50,385,810 [Tue Oct 24 11:40:37 CDT 2017] org.broadinstitute.dropseqrna.metrics.TagReadWithGeneExon done. Elapsed time: 38.80 minutes. Runtime.totalMemory()=1540358144 Exception in thread "main" htsjdk.samtools.FileTruncatedException: Premature end of file at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:382) at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:127) at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:252) at java.io.DataInputStream.read(DataInputStream.java:149) at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:404) at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:380) at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:366) at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:199) at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:661) at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:635) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:629) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:599) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:544) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:518) at org.broadinstitute.dropseqrna.metrics.TagReadWithGeneExon.doWork(TagReadWithGeneExon.java:94) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95) at org.broadinstitute.dropseqrna.cmdline.DropSeqMain.main(DropSeqMain.java:29) [Tue Oct 24 11:40:37 2017] Error in job stage3 while creating output file Secondary_gene_exon_tagged.bam. [Tue Oct 24 11:40:37 2017] RuleException: CalledProcessError in line 42 of /usr/lib/python3.4/site-packages/dropSeqPipe/Snakefiles/singleCell/post_align.snake: Command 'java -Djava.io.tmpdir=/home/tmp -Xmx90000m -jar /home/bin/picard/picard.jar MergeBamAlignment REFERENCE_SEQUENCE=/home/bin/DropSeqMetaData/mm38.fa UNMAPPED_BAM=Secondary_tagged_unmapped.bam ALIGNED_BAM=Secondary_Aligned_sorted.sam INCLUDE_SECONDARY_ALIGNMENTS=false PAIRED_RUN=false OUTPUT=/dev/stdout COMPRESSION_LEVEL=0| /home/bin/Drop-seq_tools-1.12/TagReadWithGeneExon OUTPUT=Secondary_gene_exon_tagged.bam INPUT=/dev/stdin ANNOTATIONS_FILE=/home/bin/DropSeqMetaData/mm38.refFlat TAG=GE CREATE_INDEX=true ' returned non-zero exit status 1 File "/usr/lib64/python3.4/concurrent/futures/thread.py", line 54, in run [Tue Oct 24 11:40:37 2017] Removing output files of failed job stage3 since they might be corrupted: Secondary_gene_exon_tagged.bam [Tue Oct 24 11:40:37 2017] Will exit after finishing currently running jobs. [Tue Oct 24 11:40:37 2017] Exiting because a job execution failed. Look above for error message Traceback (most recent call last): File "/bin/dropSeqPipe", line 9, in load_entry_point('dropSeqPipe==0.23a0', 'console_scripts', 'dropSeqPipe')() File "/usr/lib/python3.4/site-packages/dropSeqPipe/main.py", line 152, in main shell(post_align) File "/usr/lib/python3.4/site-packages/snakemake-3.10.1-py3.4.egg/snakemake/shell.py", line 80, in new raise sp.CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'snakemake -s /usr/lib/python3.4/site-packages/dropSeqPipe/Snakefiles/singleCell/post_align.snake --cores 16 -pT -d /PROJECTS/MOUSE --configfile /home/bin/DropSeqPipe24/dropSeqPipe/local.yaml ' returned non-zero exit status 1


How to Solve this ?

thanks,

abmmki commented 7 years ago

one more thing......... i am seeing one error file generated, which says following (however, before running, i enabled core dump):......... so what i am missing?

#

A fatal error has been detected by the Java Runtime Environment:

#

SIGSEGV (0xb) at pc=0x00007ff95cde802e, pid=31281, tid=0x00007ff95ab05700

#

JRE version: OpenJDK Runtime Environment (8.0_111-b16) (build 1.8.0_111-b16)

Java VM: OpenJDK 64-Bit Server VM (25.111-b16 mixed mode linux-amd64 )

Problematic frame:

V [libjvm.so+0x61f02e]

#

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

abmmki commented 7 years ago

OK finally, running the following command in the active shell/terminal (terminal for running commands) and increasing the memory Xmx (default 500mb) in post align snak files solved the problem

ulimit -H -c unlimited ulimit -H unlimited ulimit -c unlimited

I increased memory to 90G as my file was huge in size. So, basically deafult RAM memory allocation in the snak file was not sufficient to run my job