broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
340 stars 60 forks source link

Inappropriate call if not paired read? #51

Open jordur opened 7 years ago

jordur commented 7 years ago

Hi all / @broadinstitute, Related with #2, I tried to use Pilon v1.22 in order to polish some canu-assembled nanopore reads with the alignment of Illumina reads to those contigs. I had a set of BACs and most of them were corrected successfully. However a couple of them returned the following error:

$ java -Xmx$12G -jar pilon-1.22.jar --threads 5 --genome Canu_assembly_Bjararaca_NB07.contigs.fasta --bam BAC7.sorted

.bam --outdir polish_NB07 --output pilon.BAC7.contigs

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.simontuffs.onejar.Boot.run(Boot.java:340)
    at com.simontuffs.onejar.Boot.main(Boot.java:166)
Caused by: java.lang.IllegalStateException: Inappropriate call if not paired read
    at htsjdk.samtools.SAMRecord.requireReadPaired(SAMRecord.java:648)
    at htsjdk.samtools.SAMRecord.getFirstOfPairFlag(SAMRecord.java:706)
    at org.broadinstitute.pilon.BamFile$MateMap.addRead(BamFile.scala:224)
    at org.broadinstitute.pilon.BamFile$MateMap$$anonfun$addReads$1.apply(BamFile.scala:220)
    at org.broadinstitute.pilon.BamFile$MateMap$$anonfun$addReads$1.apply(BamFile.scala:220)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.broadinstitute.pilon.BamFile$MateMap.addReads(BamFile.scala:220)
    at org.broadinstitute.pilon.BamFile$MateMap.<init>(BamFile.scala:218)
    at org.broadinstitute.pilon.BamFile.recruitFlankReads(BamFile.scala:339)
    at org.broadinstitute.pilon.GapFiller$$anonfun$recruitReadsOfType$1.apply(GapFiller.scala:367)
    at org.broadinstitute.pilon.GapFiller$$anonfun$recruitReadsOfType$1.apply(GapFiller.scala:366)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.broadinstitute.pilon.GapFiller.recruitReadsOfType(GapFiller.scala:366)
    at org.broadinstitute.pilon.GapFiller.recruitFrags(GapFiller.scala:375)
    at org.broadinstitute.pilon.GapFiller.recruitLocalReads(GapFiller.scala:389)
    at org.broadinstitute.pilon.GapFiller.recruitReads(GapFiller.scala:391)
    at org.broadinstitute.pilon.GapFiller.assembleAcrossBreak(GapFiller.scala:51)
    at org.broadinstitute.pilon.GapFiller.fixBreak(GapFiller.scala:45)
    at org.broadinstitute.pilon.GenomeRegion$$anonfun$identifyAndFixIssues$4.apply(GenomeRegion.scala:383)
    at org.broadinstitute.pilon.GenomeRegion$$anonfun$identifyAndFixIssues$4.apply(GenomeRegion.scala:381)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.broadinstitute.pilon.GenomeRegion.identifyAndFixIssues(GenomeRegion.scala:381)
    at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$4.apply(GenomeFile.scala:119)
    at org.broadinstitute.pilon.GenomeFile$$anonfun$processRegions$4.apply(GenomeFile.scala:108)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:972)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
    at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
    at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
    at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:969)
    at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:152)
    at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
    at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

The samtools flagstat of the bam file returned:

1637356 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
1632224 + 0 mapped (99.69%:-nan%)
1290858 + 0 paired in sequencing
645429 + 0 read1
645429 + 0 read2
750594 + 0 properly paired (58.15%:-nan%)
1285622 + 0 with itself and mate mapped
1293 + 0 singletons (0.10%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

So it seems to be a properly paired file! Moreover, attached you will find files.tar.gz the Canu-assembled contig file. What I am doing wrong? Thank you in advance.

w1bw commented 7 years ago

Hi Jordur,

Your "samtools flagstat" output indicates only 1290858 of the 1637356 reads in the file are paired (have mates). I do agree Pilon should be more graceful about this, and I'll look into that. Was this bam created by only keeping the aligned reads, discarding the unmapped mates? Ideally, you would keep both reads of a pair when only one of the reads is aligned, since Pilon can use that information in local reassembly (if you're doing that). Otherwise, if you are just polishing bases, perhaps you could create two bams, one with the mapped pairs and feed in the rest through the --unpaired bam argument.

jordur commented 7 years ago

Hi @w1bw . Thank you for your suggestions. I was trying to polish canu assemblies with an Ilumina mapping to these resulting contigs. I mean, the --bam argument resulted from:

bowtie2 -x NB01.bowtie --very-sensitive-local -S BAC1.sam -1 Bac1High_S11_L001_R1_001.fastq.gz_good.fastq_clean.fastq-common.fq.notCombined_1.fastq -2 Bac1High_S11_L001_R1_001.fastq.gz_good.fastq_clean.fastq-common.fq.notCombined_2.fastq -U Bac1High_singles.fastq

I was wondering if "Inappropriate call if not paired read" meant that no paired reads were detected at all but as you saw, I got ~80% of paired reads and remarkably 99.69% of read mapped to contigs. Thus, finally I got the polished pilon fasta file with the --unpaired option, which ended without error. However, I am not sure about the assumptions I did with that option. How can I improve the polishing? Thank you in advance for your help

w1bw commented 7 years ago

The best thing would be to filter out the unpaired reads from the input bam and use only the paired reads (or create a separate --unpaired bam with the singletons). However, I will try to do a future enhancement for Pilon to be able to handle a mixed paired/unpaired BAM.

orangeSi commented 6 years ago

hi w1bw, the bam file need to be run picard-tools-1.54/MarkDuplicates.jar before run pilon ? or just bwa mem and sort ?