broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
340 stars 60 forks source link

Pilon crashing due to NegativeArraySizeException #36

Closed ctxchris closed 7 years ago

ctxchris commented 7 years ago

Hi,

I ran multiples instances of Pilon in parallel on different chunks of the genome like this:

java -Xmx450G -jar pilon-1.21.jar --outdir pilon --genome consensus.fasta --frags mapping.bam --output consensus.pilon --vcf --tracks --changes --fix indels,local,breaks,novel --targets chunk_80k-100k --verbose >chunk_80k-100k_extended.pilon.log 

The log file contains the following messages

Pilon version 1.21 Fri Dec 9 16:44:44 2016 -0500 Warning: experimental fix option breaks Warning: experimental fix option novel Genome: consensus.fasta target file: chunk_80k-100k Target: Consensus_Contig80251:1-45698 Target: Consensus_Contig80252:1-3083 ... Scanning BAMs mapping.bam: 2075457502 reads, 0 filtered, 2034497216 mapped, 1910832293 proper, 29612050 stray, FR 100% 216+/-28, max 1298 Assembling novel sequence: graphing genome...Consensus_Contig15459...Consensus_Contig16023

until this error is thrown:

Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.simontuffs.onejar.Boot.run(Boot.java:340) at com.simontuffs.onejar.Boot.main(Boot.java:166) Caused by: java.lang.NegativeArraySizeException at scala.collection.mutable.HashTable$class.resize(HashTable.scala:251) at scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$addEntry0(HashTable.scala:154) at scala.collection.mutable.HashTable$class.findOrAddEntry(HashTable.scala:166) at scala.collection.mutable.HashMap.findOrAddEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.put(HashMap.scala:76) at scala.collection.mutable.HashMap.update(HashMap.scala:81) at org.broadinstitute.pilon.Assembler.addLink(Assembler.scala:130) at org.broadinstitute.pilon.Assembler$$anonfun$graphSeq$1.apply$mcVI$sp(Assembler.scala:124) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at org.broadinstitute.pilon.Assembler.graphSeq(Assembler.scala:117) at org.broadinstitute.pilon.Assembler.addGraphSeq(Assembler.scala:106) at org.broadinstitute.pilon.GenomeFile$$anonfun$assembleNovel$1.apply(GenomeFile.scala:203) at org.broadinstitute.pilon.GenomeFile$$anonfun$assembleNovel$1.apply(GenomeFile.scala:202) at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:108) at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:108) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:108) at org.broadinstitute.pilon.GenomeFile.assembleNovel(GenomeFile.scala:202) at org.broadinstitute.pilon.GenomeFile.processRegions(GenomeFile.scala:101) at org.broadinstitute.pilon.Pilon$.main(Pilon.scala:101) at org.broadinstitute.pilon.Pilon.main(Pilon.scala)

The BAM file contains the mapping of an Illumina paired-end library to the reference genome using BWA mem.

Do you have an idea what could be the cause of the problem?

Thanks Chris

ctxchris commented 7 years ago

This error is a real pain, especially if you have a big genome for that Pilon was already running for over a week. I split up the genome in very small chunks but it still takes a lot of time. Is it possible to keep the information that has already been processed and write the polished contigs to disk? Or save the change information in a temporary file that can be used to create the consensus sequence in a separate step? In some chunks there was just one or two contigs remaining, Pilon crashed and everything was gone.

Thanks Chris

w1bw commented 7 years ago

Hi Christian,

I've had very little time for Pilon maintenance lately, but I'll try to look into this soon. Sorry for your trouble!

--bruce

On Wed, Feb 22, 2017 at 2:18 AM, Christian Dreischer < notifications@github.com> wrote:

This error is a real pain, especially if you have a big genome for that Pilon was already running for over a week. I split up the genome in very small chunks but it still takes a lot of time. Is it possible to keep the information that has already been processed and write the polished contigs to disk? Or save the change information in a temporary file that can be used to create the consensus sequence in a separate step? In some chunks there was just one or two contigs remaining, Pilon crashed and everything was gone.

Thanks Chris

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/pilon/issues/36#issuecomment-281589258, or mute the thread https://github.com/notifications/unsubscribe-auth/AAK6SbqUhYZ1ofDXh5w69F-y00zADmdyks5re-FCgaJpZM4L9e2a .

ctxchris commented 7 years ago

Hi Bruce,

part of the reason for this error might have been a somehow corrupted BAM file. I repeated the mapping and many more contigs were processed without error (some still failed). As I limited the fix options to indels and local instead of indels,local,breaks,novel even more contigs (just 1 out of 50k failed) are being processed successfully.

Thanks Chris

w1bw commented 7 years ago

Digging into the error trace above, I'm guessing this is some kind of hash overflow which isn't being handled by the scala libraries correctly. Sorry, I should really put all kinds of warnings about "--fix novel" in the documentation...I have only tried it on bacterial-sized genomes. Using it on larger genomes like this with so many reads will undoubtedly lead to memory issues. Sorry about that!