bajicv / coverage_plots

Scripts to plot coverage across entire genome or around loci of interest
1 stars 0 forks source link

Fehler in .io_bam(.count_bamfile, file, param = param) : seqlevels(param) not in BAM header #1

Open hoelzer opened 2 weeks ago

hoelzer commented 2 weeks ago

Hey,

I tried the script on my data set (which is a fragmented bacterial genome extracted from a metagenome).

Rscript --vanilla plot_BAM_density.R -f ../GCA_012799365.1_ASM1279936v1_genomic.fna -b ../GCA_012799365.1_ASM1279936v1_genomic.illumina.sorted.bam -o ../GCA_012799365-baja-coverage

but I get:

Fehler in .io_bam(.count_bamfile, file, param = param) :
  seqlevels(param) not in BAM header:
    seqlevels: 'JAAZLJ010000085.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 100867_AS23, whole genome shotgun sequence', 'JAAZLJ010000016.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 10111_AS23, whole genome shotgun sequence', 'JAAZLJ010000086.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 103774_AS23, whole genome shotgun sequence', 'JAAZLJ010000087.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 104241_AS23, whole genome shotgun sequence', 'JAAZLJ010000089.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 105954_AS23, whole genome shotgun sequence', 'JAAZLJ010000090.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 108066_AS23, whole genome shotgun sequence', 'JAAZLJ010000017.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 11411_AS23, whole genome shotgun sequence', 'JAAZLJ010000099.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 123082_AS23, whole
Ruft auf: plot_BAM_density ... kpPlotBAMDensity -> <Anonymous> -> <Anonymous> -> .io_bam
Ausführung angehalten

I should probably mention that I am trying on a MacBook...

I am unsure, but maybe the R script does not extract the FASTA headers correctly and then they don't match between the FASTA and BAM.

❯ samtools view -H GCA_012799365.1_ASM1279936v1_genomic.nanopore.sorted.bam
@HD VN:1.6  SO:coordinate
@SQ SN:JAAZLJ010000085.1    LN:27319
@SQ SN:JAAZLJ010000016.1    LN:14213
@SQ SN:JAAZLJ010000086.1    LN:12008
@SQ SN:JAAZLJ010000087.1    LN:64549
@SQ SN:JAAZLJ010000088.1    LN:9320
@SQ SN:JAAZLJ010000089.1    LN:51768
...
❯ grep ">" GCA_012799365.1_ASM1279936v1_genomic.fna
>JAAZLJ010000085.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 100867_AS23, whole genome shotgun sequence
>JAAZLJ010000016.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 10111_AS23, whole genome shotgun sequence
>JAAZLJ010000086.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 103774_AS23, whole genome shotgun sequence
>JAAZLJ010000087.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 104241_AS23, whole genome shotgun sequence
>JAAZLJ010000088.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 105800_AS23, whole genome shotgun sequence
>JAAZLJ010000089.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 105954_AS23, whole genome shotgun sequence
>JAAZLJ010000234.1 MAG: Syntrophomonadaceae bacterium isolate AS23ysBPME_21 107784_AS23, whole genome shotgun sequence
hoelzer commented 2 weeks ago

Works when I do

awk '{print $1}' GCA_012799365.1_ASM1279936v1_genomic.fna > GCA_012799365.1_ASM1279936v1_genomic.fna.renamed

and use this as reference.

However, I think a user might often end up in a difference between the IDs of the ref and the BAM depending on the used mapper. It might be good to implement ID mapping always only until the first white space in the FASTA ID...