broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
340 stars 60 forks source link

Crash correcting single contig #54

Closed TomHarrop closed 5 years ago

TomHarrop commented 7 years ago

I'm trying to run pilon on a single contig I pulled from a canu assembly of nanopore reads. I expect plenty of errors in the contig because nanopore coverage was low. In fact, it's supposed to be the mitochondrial genome, but you can see from the contig size it's way off.

I used bwa mem to map two PE Illumina libs (2x150b and 2x100b) against the contig. pilon seems to crash during the # Attempting to fix local continuity breaks step.

Here is the log:

Genome: test/tig00000001.fa
Fixing snps, indels, gaps, local
Input genome size: 36803
Scanning BAMs
test/pe_mapped.bam: 1218578161 reads, 0 filtered, 4555595 mapped, 4495939 proper, 17226 stray, FR 100% 384+/-111, max 719
Processing tig00000001:1-36803
frags test/pe_mapped.bam: coverage 12626
Total Reads: 4596059, Coverage: 12626, minDepth: 1263
Confirmed 31603 of 36803 bases (85.87%)
Corrected 18 snps; 0 ambiguous bases; corrected 298 small insertions totaling 376 bases, 5 small deletions totaling 5 bases
# Attempting to fix local continuity breaks
Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.simontuffs.onejar.Boot.run(Boot.java:340)
    at com.simontuffs.onejar.Boot.main(Boot.java:166)
Caused by: java.lang.NullPointerException
    at htsjdk.samtools.SAMFileHeader.addReadGroup(SAMFileHeader.java:191)
    at org.broadinstitute.pilon.GapFiller$$anonfun$writeBam$1.apply(GapFiller.scala:401)
    at org.broadinstitute.pilon.GapFiller$$anonfun$writeBam$1.apply(GapFiller.scala:398)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.broadinstitute.pilon.GapFiller.writeBam(GapFiller.scala:398)
    at org.broadinstitute.pilon.GapFiller.assembleIntoBreak(GapFiller.scala:125)
    at org.broadinstitute.pilon.GapFiller.assembleAcrossBreak(GapFiller.scala:52)
    at org.broadinstitute.pilon.GapFiller.fixBreak(GapFiller.scala:45)
    at org.broadinstitute.pilon.GenomeRegion$$anonfun$identifyAndFixIssues$4.apply(GenomeRegion.scala:383)
    at org.broadinstitute.pilon.GenomeRegion$$anonfun$identifyAndFixIssues$4.apply(GenomeRegion.scala:381)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.broadinstitute.pilon.GenomeRegion.identifyAndFixIssues(GenomeRegion.scala:381)
    at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$4.apply(GenomeFile.scala:119)
    at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$4.apply(GenomeFile.scala:108)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:972)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
    at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
    at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:969)
    at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
    at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
    at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Here's the command I used:

java -Xmx500G -jar bin/pilon/pilon-1.22.jar \
    --genome test/tig00000001.fa \
    --frags test/pe_mapped.bam \
    --output tig00000001 \
    --outdir test/pilon \
    --tracks \
    --dumpreads

Here's the java version:

java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b31)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b31, mixed mode)

This is a box with 1 TB of RAM running RHEL 7.3.

Any troubleshooting ideas?

Cheers!

w1bw commented 7 years ago

Apparently --dumpreads isn't working correctly. The intent of that was to gather the reads it was using to do a local reassembly for debugging or to experiment with using external assemblers. I clearly haven't used it in a long time. Please try it without that option and see how things go!

TomHarrop commented 7 years ago

It worked without --dumpreads. I do want to reassemble the short reads and compare to the nanopore contig, but I can easily grab them from the BAM. I was just being lazy with --dumpreads. Thanks!