38 / d4-format

The D4 Quantitative Data Format
MIT License
150 stars 20 forks source link

Mapping quality filter not functioning #57

Closed toddrichmond closed 1 year ago

toddrichmond commented 1 year ago

Using "D4 Utilities Program 0.3.7" installed with conda

I created 3 d4 files using different mapping quality filters:

d4tools create --mapping-qual 0 S01_HyperPlus_Program1_HyperExomev1_Rep1_S1_sorted_dupsrm.bam MQ0.d4 d4tools create --mapping-qual 20 S01_HyperPlus_Program1_HyperExomev1_Rep1_S1_sorted_dupsrm.bam MQ20.d4 d4tools create --mapping-qual 60 S01_HyperPlus_Program1_HyperExomev1_Rep1_S1_sorted_dupsrm.bam MQ60.d4

I then looked at coverage depths for a known problematic gene SMN1 d4tools show -H MQ0.d4 chr5:70924941-70966375 > SMN1.MQ0.txt d4tools show -H MQ20.d4 chr5:70924941-70966375 > SMN1.MQ20.txt d4tools show -H MQ60.d4 chr5:70924941-70966375 > SMN1.MQ60.txt

All three files are identical - using 'diff' shows no difference between the three files. Using the same input BAM, and counting the number of reads using samtools 1.15, shows a very clear difference

samtools view -c -q 0 S01_HyperPlus_Program1_HyperExomev1_Rep1_S1_sorted_dupsrm.bam chr5:70924941-70966375 => 1914 samtools view -c -q 20 S01_HyperPlus_Program1_HyperExomev1_Rep1_S1_sorted_dupsrm.bam chr5:70924941-70966375 => 230 samtools view -c -q 60 S01_HyperPlus_Program1_HyperExomev1_Rep1_S1_sorted_dupsrm.bam chr5:70924941-70966375 => 33

Perhaps I don't understand how the --mapping-qual switch is supposed to work, but it's clearly not working as I expect. SMN1.zip

petersudmant commented 1 year ago

This is a known issue related to https://github.com/38/d4-format/issues/56 Can you try downloading the latest from github and seeing if it fixes the problem? I had the same issue and my test example worked with the latest code

toddrichmond commented 1 year ago

Thanks, this fixes the issue.