DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
184 stars 44 forks source link

Error in matching of headers in FASTA and BAM #96

Closed jotech closed 4 years ago

jotech commented 4 years ago

I'm having problems when creating the coverage file. After creating a mapping with bbmap I tried to use it with blobtools but get the error:

blobtools map2cov -i JUb134_2a/JUb134.contigs.fasta -b JUb134_mapping_sorted.bam
[+] Parsing FASTA - JUb134_2a/JUb134.contigs.fasta
[+] Parsing bam0 - /work_beegfs/sukem066/cembio/raw/assemblies/JUb134_mapping_sorted.bam
[X] Headers in FASTA and BAM don't seem to match

But the header seem to be fine:

cat JUb134_2a/JUb134.contigs.fasta | grep ">"
>tig00000001 len=3421598 reads=6559 covStat=2111.80 gappedBases=no class=contig suggestRepeat=no suggestCircular=yes
>tig00000088 len=189412 reads=99 covStat=264.14 gappedBases=no class=contig suggestRepeat=no suggestCircular=yes
>tig00000093 len=76753 reads=32 covStat=94.92 gappedBases=no class=contig suggestRepeat=no suggestCircular=yes
>tig00000099 len=15682 reads=6 covStat=-3.34 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00001073 len=430451 reads=423 covStat=520.69 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
>tig00001074 len=20212 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no
samtools view -H JUb134_mapping_sorted.bam
@HD     VN:1.4  SO:coordinate
@SQ     SN:tig00000001 len=3421598 reads=6559 covStat=2111.80 gappedBases=no class=contig suggestRepeat=no suggestCircular=yes  LN:3421598
@SQ     SN:tig00000088 len=189412 reads=99 covStat=264.14 gappedBases=no class=contig suggestRepeat=no suggestCircular=yes      LN:189412
@SQ     SN:tig00000093 len=76753 reads=32 covStat=94.92 gappedBases=no class=contig suggestRepeat=no suggestCircular=yes        LN:76753
@SQ     SN:tig00000099 len=15682 reads=6 covStat=-3.34 gappedBases=no class=contig suggestRepeat=no suggestCircular=no  LN:15682
@SQ     SN:tig00001073 len=430451 reads=423 covStat=520.69 gappedBases=no class=contig suggestRepeat=no suggestCircular=no      LN:430451
@SQ     SN:tig00001074 len=20212 reads=1 covStat=0.00 gappedBases=no class=contig suggestRepeat=no suggestCircular=no   LN:20212
@PG     ID:BBMap        PN:BBMap        VN:38.69        CL:java -ea -Xmx172515m -Xms172515m align2.BBMap build=1 overwrite=true fastareadlen=500 ref=JUb134_2a/JUb134.contigs.fasta in=H07478-L1_S6_L001_R1_001_filtered.fastq in2=H07478-L1_S6_L001_R2_001_filtered.fastq out=JUb134_mapping.sam bamscript=bs.sh

I appreciate any help.

DRL commented 4 years ago

Hi jotech,

Blobtools will split headers on the first whitespace (see readme)

I did not know that bbmap did not do that. Easiest/fastest solution is probably to remap with bwa mem

cheers,

dom

jotech commented 4 years ago

ok this worked thanks!