DiltheyLab / HLA-LA

Fast HLA type inference from whole-genome data
GNU General Public License v3.0
124 stars 42 forks source link

BWA error #42

Open Leon-Bichmann opened 4 years ago

Leon-Bichmann commented 4 years ago

Hi,

I have been trying to use HLA-LA for illumina standard WES data. Unfortunately, I'm running into a BWA error after the edge path generation:

[ Wed Jun 24 19:10:29 2020 ] extensionAligner::extensionAligner(..): Indexing of edge paths; have 16804 completed edge paths.

$PATH/HLALA/bin/bwa mem -t15 -M -a /$PATH/HLALA/opt/hla-la/src/../graphs/PRG_MHC_GRCh38_withIMGT/extendedReferenceGenome/extendedReferenceGenome.fa /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/R_1.fastq /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/R_2.fastq | $PATH/HLALA/bin/samtools view -Sb - > /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/remapped_with_a.bam.unsorted
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 776266 sequences (110622870 bp)...
[E::bns_fetch_seq] begin=4104638566, mid=4104638567, end=3150488291, len=954150275, seq=0x7ff48c834010, rid=1141, far_beg=3150476765, far_end=3150488291
bwa: bntseq.c:444: bns_fetch_seq: Assertion `seq && *end - *beg == len' failed.

$PATH/HLALA/bin/samtools sort -o /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/remapped_with_a.bam /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/remapped_with_a.bam.unsorted
$PATH/HLALA/bin/samtools index /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/remapped_with_a.bam
 [ Wed Jun 24 19:10:56 2020 ] Remapping done.

processBAM::initBAM(..): Will examine 914 PRG-mapped intervals.
processBAM::extractSeeds(): examined 0 reads, transformed into 0 seeds.
processBAM::estimateInsertSize(): Deal 0 total proto-seeds (i.e. read pairs), of which 0 are incomplete.
HLA-LA: mapper/processBAM.cpp:1043: std::pair<double, double> mapper::processBAM::calculateInsertSizeFromHistogram(const std::map<int, double>&, bool): Assertion `set_weighted_80 && set_weighted_20 && set_median' failed.

HLA-LA execution not successful. Command was ../bin/HLA-LA --action HLA --maxThreads 15 --sampleID I19D040a04 --outputDirectory /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04 --PRG_graph_dir /$PATH/HLALA/opt/hla-la/src/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQU /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/R_U.fastq.splitLongReads --FASTQ1 /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/R_1.fastq --FASTQ2 /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/R_2.fastq --bwa_bin $PATH/HLALA/bin/bwa --samtools_bin $PATH/HLALA/bin/samtools --mapAgainstCompleteGenome 1 --longReads 0

Thanks and best, Leon

AlexanderDilthey commented 4 years ago

Hi @Leon-Bichmann,

Can you reproduce the error by just executing the BWA command on the command line? I.e.

$PATH/HLALA/bin/bwa mem -t15 -M -a /$PATH/HLALA/opt/hla-la/src/../graphs/PRG_MHC_GRCh38_withIMGT/extendedReferenceGenome/extendedReferenceGenome.fa /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/R_1.fastq /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/R_2.fastq | $PATH/HLALA/bin/samtools view -Sb - > /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/remapped_with_a.bam.unsorted

If you look at the FASTQ and FASTA files, do they look OK?

Leon-Bichmann commented 4 years ago

Hi,

thanks for getting back on this.

sorry, have to correct this post - it is failing as well:

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 776266 sequences (110622870 bp)...
[E::bns_fetch_seq] begin=4104638581, mid=4104638582, end=3150488291, len=954150290, seq=0x7fb66c275010, rid=1141, far_beg=3150476765, far_end=3150488291
bwa: bntseq.c:444: bns_fetch_seq: Assertion `seq && *end - *beg == len' failed.

`as far as I can tell as a non-expert the FASTQs and FASTAs look alright

Best, Leon

AlexanderDilthey commented 4 years ago

Hmm, weird!

Could you try to execute 'bwa index' on the reference genome, i.e. /$PATH/HLALA/opt/hla-la/src/../graphs/PRG_MHC_GRCh38_withIMGT/extendedReferenceGenome/extendedReferenceGenome.fa, and see whether there's still the same error?

I assume you have some other paired-end sequencing datasets from other projects - could you try mapping one of these datasets against the reference genome used above and see whether you get the same error?

This should tell us whether the error is coming from the reference or from the reads...

Leon-Bichmann commented 4 years ago

Hey thanks for the help, I'll have to postpone further work on this for a while so for now I'll close this issue.

boyangzhao commented 4 years ago

I'm actually getting the same error, about calculateInsertSizeFromHistogram assertion failed. I can run the example cram, but with my own bam, this is the error. I have added a new reference to the database in knownReferences per instructions.

 [ Sat Aug 29 20:15:49 2020 ] Remapping done.
processBAM::initBAM(..): Will examine 914 PRG-mapped intervals.
processBAM::extractSeeds(): examined 0 reads, transformed into 0 seeds.
processBAM::estimateInsertSize(): Deal 0 total proto-seeds (i.e. read pairs), of which 0 are incomplete.
HLA-LA: mapper/processBAM.cpp:1043: std::pair<double, double> mapper::processBAM::calculateInsertSizeFromHistogram(const std::map<int, double>&, bool): Assertion `set_weighted_80 && set_weighted_20 && set_median' failed.
HLA-LA execution not successful. Command was ../bin/HLA-LA --action HLA --maxThreads 4 --sampleID test --outputDirectory /opt/HLA-LA/working/test --PRG_graph_dir /opt/HLA-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT_indexed --FASTQU /opt/HLA-LA/working/test/R_U.fastq.splitLongReads --FASTQ1 /opt/HLA-LA/working/test/R_1.fastq --FASTQ2 /opt/HLA-LA/working/test/R_2.fastq --bwa_bin /opt/bwa/bwa --samtools_bin /opt/samtools/bin/samtools --mapAgainstCompleteGenome 1 --longReads 0

@Leon-Bichmann Perhaps it's useful to keep this issue open, as others also seem to get this error.