broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
338 stars 60 forks source link

bam doesn't have sequence #111

Open emmannaemeka opened 4 years ago

emmannaemeka commented 4 years ago

Hi,

I tried running java -jar /Users/emmannadi/pilon/pilon-1.23.jar --genome /Users/emmannadi/Documents/C_afrcana_download/CBS11016/Sequences/CBS11016_Unordered_spades.fasta --bam /Users/emmannadi/Documents/C_afrcana_download/CBS11016/Bowtie2/SC5314.bam --threads 30 --fix all --output /Users/emmannadi/Documents/C_afrcana_download/CBS11016/Bowtie2

The output Pilon version 1.23 Mon Nov 26 16:04:05 2018 -0500 Genome: /Users/emmannadi/Documents/C_afrcana_download/CBS11016/Sequences/CBS11016_Unordered_spades.fasta Fixing snps, indels, gaps, local Input genome size: 19207539 Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: /Users/emmannadi/Documents/C_afrcana_download/CBS11016/Bowtie2/SC5314.bam doesn't have sequence for any of NODE_18440_length_111_cov_51.250000, NODE_4412_length_647_cov_93.707770, NODE_28905_length_88_cov_39.030303, at scala.Predef$.require(Predef.scala:277) at org.broadinstitute.pilon.BamFile.validateSeqs(BamFile.scala:78) at org.broadinstitute.pilon.GenomeFile.$anonfun$processRegions$1(GenomeFile.scala:87) at org.broadinstitute.pilon.GenomeFile.$anonfun$processRegions$1$adapted(GenomeFile.scala:87) at scala.collection.immutable.List.foreach(List.scala:388) at org.broadinstitute.pilon.GenomeFile.processRegions(GenomeFile.scala:87) at org.broadinstitute.pilon.Pilon$.main(Pilon.scala:105) at org.broadinstitute.pilon.Pilon.main(Pilon.scala)

What can I bet getting wrong

SergejN commented 4 years ago

hi, did you check the BAM header with samtools view -H <YOUR_BAM> ? My naive guess would be that the FASTA sequence you are using with --genome is slightly different (at least in terms of sequence names) from the one used to generate the Bowtie2 index.

spock commented 2 years ago

I've run into the same exception in a rather peculiar corner case: my dataset has very low coverage of 2x (this is a part of a tools evaluation work package). With such low coverage it is conceivable that some of the contigs of the large plant genome got no reads at all.

In my case this is clearly a data issue, and not a Pilon issue 🙂 (Pilon could skip contigs without sequences, but I'd actually prefer to have an exception - like it is now - so that any problems are highlighted.)

Maybe the issue could be closed as neither of the reported cases point at any code problems in Pilon.