NGSEP / NGSEPcore

NGSEP is an integrated framework for analysis of high throughput sequencing (HTS) reads. The main functionality of NGSEP is the variants detector, which allows to make integrated discovery and genotyping of Single Nucleotide Variants (SNVs), insertions, deletions, and genomic regions with copy number variation (CNVs).
GNU General Public License v3.0
45 stars 17 forks source link

error sequence dictionary and index #40

Closed danessel closed 2 years ago

danessel commented 2 years ago

With the latest version I'm getting this error although I couldn't find any didderence in number of contigs ? An older version NGSEPcore_4.0.1.jar works on the same data

Oct 13, 2021 8:24:53 AM ngsep.main.OptionValuesDecoder loadGenomeWithLowerCase INFO: Loading genome from: ../reference/Leek_44_CPMT.fa Oct 13, 2021 8:42:01 AM ngsep.main.OptionValuesDecoder loadGenomeWithLowerCase INFO: Loaded genome with: 59337 sequences. Total length: 38889402655 from file: ../reference/Leek_44_CPMT.fa Oct 13, 2021 8:42:01 AM ngsep.discovery.MultisampleVariantsDetector logParameters INFO: Input files: [2021-01.bam2] Oct 13, 2021 8:42:01 AM ngsep.discovery.MultisampleVariantsDetector logParameters INFO: Loaded reference genome from: ../reference/Leek_44_CPMT.fa Output file: Leek.raw.NGSP.vcf Ignore variants in lower case reference positions: false Maximum number of alignments starting at the same position: 5 Minimum mapping quality to consider an alignment unique: 2 Process non unique primary alignments: false Process secondary alignments: false Base pairs to ignore from the 5' end of each read: 0 Base pairs to ignore from the 3' end of each read: 0 Prior heterozygosity rate: 0.001 Maximum base quality score (PHRED): 100 Minimum variant quality score (PHRED): 1 Call SNVs within STRs: false Normal ploidy: 4 Print header with sample ploidy in the vcf file: false

Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at ngsep.NGSEPcore.main(NGSEPcore.java:66) Caused by: htsjdk.samtools.SAMException: Sequence dictionary and index contain different numbers of contigs at htsjdk.samtools.reference.AbstractIndexedFastaSequenceFile.sanityCheckDictionaryAgainstIndex(AbstractIndexedFastaSequenceFile.java:107) at htsjdk.samtools.reference.AbstractIndexedFastaSequenceFile.(AbstractIndexedFastaSequenceFile.java:68) at htsjdk.samtools.reference.IndexedFastaSequenceFile.(IndexedFastaSequenceFile.java:80) at htsjdk.samtools.reference.IndexedFastaSequenceFile.(IndexedFastaSequenceFile.java:98) at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:139) at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:122) at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:111) at htsjdk.samtools.cram.ref.ReferenceSource.(ReferenceSource.java:65) at htsjdk.samtools.cram.ref.ReferenceSource.(ReferenceSource.java:61) at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.referenceSequence(SamReaderFactory.java:259) at ngsep.alignments.io.ReadAlignmentFileReader.init(ReadAlignmentFileReader.java:164)

jduitama commented 2 years ago

Hi

Thanks for your interest in NGSEP. From version 4.0.1 to 4.0.2 (and beyond) we updated the version of the hts-jdk, which is the base library that we use to read and write BAM files. The new version seems to be validating headers of the bam file that the old version was not validating. More important than that, it could be that your bam file was not generated with the same reference used to run variants detection. On one side, please run samtools faidx on the reference genome. On the other side, you can run samtools view -H on the bam file and check if the squence dictionary headers coincide with the first two columns of the fai generated with samtools faidx.

Let me know how things go.