biod / sambamba

Tools for working with SAM/BAM data
http://thebird.nl/blog/D_Dragon.html
GNU General Public License v2.0
565 stars 105 forks source link

Difference of depth/mean coverage calculation among sambamba, bedtools, and mosdepth in high coverage panel data #510

Open ipstone opened 1 year ago

ipstone commented 1 year ago

Hello,

We have some high coverage panel sequencing data, but checking the depth of the regions using mosdepth, bedtools and sambamba, give quite a range of results (results obtained running these commands through snakemake file).

These tools are run with the default setting, what might cause such a huge difference in depth calculations? Thanks in advance!

sambamba: 
"sambamba depth region -L bed/study_genes.bed {input} > coverage/study_sambamba/interval_coverage/{wildcards.sample}_interval_coverage.txt"

# chrom chromStart  chromEnd    F3  readCount   meanCoverage    sampleName
1   36349022    36349047    NM_001317122_cds_0_0_chr1_36349023_f    502 400 Sample-10
1   36354027    36354211    NM_001317122_cds_1_0_chr1_36354028_f    958 372.63  Sample-10
1   36358157    36358278    NM_001317122_cds_2_0_chr1_36358158_f    593 309.859 Sample-10
1   36358697    36358879    NM_001317122_cds_3_0_chr1_36358698_f    777 323.709 Sample-10
1   36359274    36359411    NM_001317122_cds_4_0_chr1_36359275_f    834 374.27  Sample-10
1   36359637    36359772    NM_001317122_cds_5_0_chr1_36359638_f    686 315.548 Sample-10
1   36359915    36360003    NM_001317122_cds_6_0_chr1_36359916_f    669 357.909 
...

bedtools:
"bedtools coverage -mean -a bed/study_genes.bed -b {input} > coverage/study/interval_coverage/{wildcards.sample}_interval_coverage.txt"

chr start   end gene    coverage    sample
1   36349022    36349047    NM_001317122_cds_0_0_chr1_36349023_f    43840.9609375   Sample-10
1   36354027    36354211    NM_001317122_cds_1_0_chr1_36354028_f    43905.3320312   Sample-10
1   36358157    36358278    NM_001317122_cds_2_0_chr1_36358158_f    26675.0253906   Sample-10
1   36358697    36358879    NM_001317122_cds_3_0_chr1_36358698_f    32416.6210938   Sample-10
1   36359274    36359411    NM_001317122_cds_4_0_chr1_36359275_f    54923.9648438   Sample-10
1   36359637    36359772    NM_001317122_cds_5_0_chr1_36359638_f    35807.59375 Sample-10
1   36359915    36360003    NM_001317122_cds_6_0_chr1_36359916_f    29420.7265625   Sample-10

mosdepth:
mosdepth -n --by bed/study_genes.bed coverage/study_mosdepth/interval_coverage/{wildcards.sample}-interval {input}
        gzip -dc coverage/study_mosdepth/interval_coverage/{wildcards.sample}-interval.regions.bed.gz > {output.interval_coverage}

chr start   end gene    coverage    sample
1   36349022    36349047    NM_001317122_cds_0_0_chr1_36349023_f    289.08  Sample-10
1   36349022    36349047    NM_012199_cds_0_0_chr1_36349023_f   289.08  Sample-10
1   36354027    36354211    NM_001317122_cds_1_0_chr1_36354028_f    287.42  Sample-10
1   36354027    36354211    NM_012199_cds_1_0_chr1_36354028_f   287.42  Sample-10
1   36358157    36358278    NM_001317122_cds_2_0_chr1_36358158_f    277.69  Sample-10
1   36358157    36358278    NM_012199_cds_2_0_chr1_36358158_f   277.69  Sample-10
1   36358173    36358278    NM_001317123_cds_2_0_chr1_36358174_f    278.95  Sample-10
1   36358697    36358879    NM_001317122_cds_3_0_chr1_36358698_f    283.55  Sample-10
...