genome / bam-readcount

Count bases in BAM/CRAM files
MIT License
298 stars 95 forks source link

Reported depth not matching #101

Closed jcgrenier closed 2 years ago

jcgrenier commented 2 years ago

Hello!

We are presently working with samples that were processed with an amplicons based approach, so the coverage of the specific regions are quite high. However, some steps in our pipeline are trimming reads that are starting from the primer used, so we are not biaised toward the primer sequences. However, this is creating some effects in the sorted alignment which seems to create some problems for bam-readcount. Some reads are starting before some others, but the following ones are covering more bases, but are now ignored by the program.

Can you help us out? Thanks a lot.

chrisamiller commented 2 years ago

Can you provide a small example that reproduces the problem?

jcgrenier commented 2 years ago

Sure! Here's one! (Sorry for the file type, just gunzip it, otherwise it was not compatible with the way git is handling the attached files).

The reference will be this one : https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2?report=fasta&log$=seqview

Thanks! example.bam.gz

jcgrenier commented 2 years ago

Hello @chrisamiller, Any updates concerning the issue? Thanks a lot!

Have a great day!

apldx commented 2 years ago

Thanks for test case! Looking into this now.

apldx commented 2 years ago

@jcgrenier, I was able to replicate the issue. Fixing it is going to take a while longer, but there is a simple workaround for your case: just specify the entire reference as a region to bam-readcount, eg

bam-readcount -w1 -f NC_045512.2.fa example.bam NC_045512.2:1-29903

The problem is that if a region or site list is not specified, bam-readcount uses a wrapper function from samtools-1.10/sam.c sampileup() instead of doing its own iteration (where it can set the depth), and the passed-in config doesn't change the depth in that function.

Sorry you have to use a workaround, but it should work for now, and we'll work on fixing this! Thank you for your test case.

jcgrenier commented 2 years ago

Hello @apldx, Thanks for finding a workaround for that issue! It's kind of a specific one given the design of the experiments (amplicon based-sequencing).

Have a great day! Jean-Christophe