broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
341 stars 60 forks source link

unknown crash #12

Closed gconcepcion closed 7 years ago

gconcepcion commented 8 years ago

I'm using pilon to attempt correction of a pacbio genome (contig by contig) from raw pacbio reads aligned to a consensus sequence. Total genome size is roughly 130Mb split among 47 contigs.

Here is an example of one such command:

java -Xmx32G -jar pilon-1.18.jar --genome /lustre/hpcprod/gconcepcion/160223XXXXXX/000003F/cns-000003F.fasta --unpaired /lustre/hpcprod/gconcepcion/160223XXXXXX/000003F/cns-aln-000003F.sorted.bam --output /lustre/hpcprod/gconcepcion/160613pilon/pilon/pilon-000003F --changes --variant --tracks

Pilon completes successfully for the first and largest 14Mb contig, fails for the next 3 contigs (sorted by size), and then works for the rest of the dataset. The error message I get is:

Pilon version 1.18 Sun Jun 12 01:01:53 2016 -0400
Genome: /lustre/hpcprod/gconcepcion/160223XXXXXX/000003F/cns-000003F.fasta
Fixing bases, gaps, local, breaks
Input genome size: 11568076
Scanning BAMs
Processing 000003F|quiver:1-5784038
unpaired /lustre/hpcprod/gconcepcion/160223XXXXXX/000003F/cns-aln-000003F.sorted.bam: coverage 40
Total Reads: 14967, Coverage: 40, minDepth: 5
Confirmed 5652677 of 5784038 bases (97.73%)
Corrected 0 snps; 0 ambiguous bases; 15 small insertions totaling 164 bases; 0 small deletions totaling 0 bases
Large collapsed region: 000003F|quiver:2122220-2135326 size 13107
Large collapsed region: 000003F|quiver:5043311-5056562 size 13252
Large collapsed region: 000003F|quiver:5322099-5334947 size 12849
Large collapsed region: 000003F|quiver:5356778-5373125 size 16348
# Attempting to fix local continuity breaks
# fix break: 000003F|quiver:244085-244093 0 -0 +0 NoSolution
# fix break: 000003F|quiver:246710 0 -0 +0 NoSolution
# fix break: 000003F|quiver:247018-261080 0 -0 +0 NoSolution
# fix break: 000003F|quiver:977066-977073 0 -0 +0 NoSolution
# fix break: 000003F|quiver:977497-977670 0 -0 +0 NoSolution
# fix break: 000003F|quiver:977918-978000 0 -0 +0 NoSolution
# fix break: 000003F|quiver:2002563 0 -0 +0 NoSolution
# fix break: 000003F|quiver:2002798-2002848 0 -0 +0 NoSolution
# fix break: 000003F|quiver:2695769-2695783 0 -0 +0 NoSolution
# fix break: 000003F|quiver:2696138-2696542 0 -0 +0 NoSolution
# fix break: 000003F|quiver:2708804-2712360 0 -0 +0 NoSolution
# fix break: 000003F|quiver:4531643 0 -0 +0 NoSolution
# fix break: 000003F|quiver:4531904-4531912 0 -0 +0 NoSolution
# fix break: 000003F|quiver:4532175-4532185 0 -0 +0 NoSolution
# fix break: 000003F|quiver:4533076-4533174 0 -0 +0 NoSolution
# fix break: 000003F|quiver:4533385-4533765 0 -0 +0 NoSolution TandemRepeat 2
000003F|quiver:1-5784038 log:
Finished processing 000003F|quiver:1-5784038
Processing 000003F|quiver:5784039-11568076
unpaired /lustre/hpcprod/gconcepcion/160223XXXXXX/000003F/cns-aln-000003F.sorted.bam: Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at com.simontuffs.onejar.Boot.run(Boot.java:340)
    at com.simontuffs.onejar.Boot.main(Boot.java:166)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
    at org.broadinstitute.pilon.PileUpRegion$$anonfun$addRead$1.apply(PileUpRegion.scala:142)
    at org.broadinstitute.pilon.PileUpRegion$$anonfun$addRead$1.apply(PileUpRegion.scala:127)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at org.broadinstitute.pilon.PileUpRegion.addRead(PileUpRegion.scala:127)
    at org.broadinstitute.pilon.BamFile$$anonfun$process$1.apply(BamFile.scala:121)
    at org.broadinstitute.pilon.BamFile$$anonfun$process$1.apply(BamFile.scala:114)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at org.broadinstitute.pilon.BamFile.process(BamFile.scala:114)
    at org.broadinstitute.pilon.GenomeRegion.processBam(GenomeRegion.scala:279)
    at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$5$$anonfun$apply$2.apply(GenomeFile.scala:113)
    at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$5$$anonfun$apply$2.apply(GenomeFile.scala:113)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$5.apply(GenomeFile.scala:113)
    at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$5.apply(GenomeFile.scala:110)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:972)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
    at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
    at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:969)
    at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
    at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
    at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Any idea what's happening? Thanks!

w1bw commented 8 years ago

My guess is that it's a bug in pilon's handling of the pacbio read alignment cigar string. I can try to look at it from the stack trace, but it might take an example to find.

However, I have recommended against using raw pacbio aligned reads for pilon as a majority of the input reads, because Pilon isn't specifically aware of the PB error model. There will likely be spurious indels corrections, especially in homopolymer runs. Generally people use pilon to correct PB assemblies with illumina data, or use error-corrected or circular consensus PB reads.

Someday I may try to add specific pacbio awareness so that it will do a better job with raw reads.

gconcepcion commented 8 years ago

Thanks for the insight - I got much better results (and no run-time errors) when I mapped the corrected reads back to the consensus sequence.

If I come up with a small test-case example with data I can share, I'll send it your way.

aaronphillips7493 commented 4 years ago

Hey, I have received a similar error when I try to use Illumina short reads to polish a genome assembly (1956 contigs) generated from Nanopore long reads (genome = 965Mbp). The cause:

Caused by: java.lang.ArrayIndexOutOfBoundsException: -110

What does this mean please?