arq5x / bedtools2

bedtools - the swiss army knife for genome arithmetic
MIT License
941 stars 287 forks source link

Difference of depth/mean coverage calculation among sambamba, bedtools, and mosdepth in high coverage panel data #1065

Open ipstone opened 1 year ago

ipstone commented 1 year ago

Hello, Aaron and everyone - Thanks for the great tools you guys have created!

We have some high coverage panel sequencing data, but checking the depth of the regions using mosdepth, bedtools and sambamba, give quite a range of results (results obtained running these commands through snakemake file).

These tools are run with the default setting, what might cause such a huge difference in depth calculations? Are there some filtering or duplicate reads filtering in bedtools for bedtools coverage -mean calculation?

Thanks in advance!

sambamba: 
"sambamba depth region -L bed/study_genes.bed {input} > coverage/study_sambamba/interval_coverage/{wildcards.sample}_interval_coverage.txt"

# chrom chromStart  chromEnd    F3  readCount   meanCoverage    sampleName
1   36349022    36349047    NM_001317122_cds_0_0_chr1_36349023_f    502 400 Sample-10
1   36354027    36354211    NM_001317122_cds_1_0_chr1_36354028_f    958 372.63  Sample-10
1   36358157    36358278    NM_001317122_cds_2_0_chr1_36358158_f    593 309.859 Sample-10
1   36358697    36358879    NM_001317122_cds_3_0_chr1_36358698_f    777 323.709 Sample-10
1   36359274    36359411    NM_001317122_cds_4_0_chr1_36359275_f    834 374.27  Sample-10
1   36359637    36359772    NM_001317122_cds_5_0_chr1_36359638_f    686 315.548 Sample-10
1   36359915    36360003    NM_001317122_cds_6_0_chr1_36359916_f    669 357.909 
...

bedtools:
"bedtools coverage -mean -a bed/study_genes.bed -b {input} > coverage/study/interval_coverage/{wildcards.sample}_interval_coverage.txt"

chr start   end gene    coverage    sample
1   36349022    36349047    NM_001317122_cds_0_0_chr1_36349023_f    43840.9609375   Sample-10
1   36354027    36354211    NM_001317122_cds_1_0_chr1_36354028_f    43905.3320312   Sample-10
1   36358157    36358278    NM_001317122_cds_2_0_chr1_36358158_f    26675.0253906   Sample-10
1   36358697    36358879    NM_001317122_cds_3_0_chr1_36358698_f    32416.6210938   Sample-10
1   36359274    36359411    NM_001317122_cds_4_0_chr1_36359275_f    54923.9648438   Sample-10
1   36359637    36359772    NM_001317122_cds_5_0_chr1_36359638_f    35807.59375 Sample-10
1   36359915    36360003    NM_001317122_cds_6_0_chr1_36359916_f    29420.7265625   Sample-10

mosdepth:
mosdepth -n --by bed/study_genes.bed coverage/study_mosdepth/interval_coverage/{wildcards.sample}-interval {input}
        gzip -dc coverage/study_mosdepth/interval_coverage/{wildcards.sample}-interval.regions.bed.gz > {output.interval_coverage}

chr start   end gene    coverage    sample
1   36349022    36349047    NM_001317122_cds_0_0_chr1_36349023_f    289.08  Sample-10
1   36349022    36349047    NM_012199_cds_0_0_chr1_36349023_f   289.08  Sample-10
1   36354027    36354211    NM_001317122_cds_1_0_chr1_36354028_f    287.42  Sample-10
1   36354027    36354211    NM_012199_cds_1_0_chr1_36354028_f   287.42  Sample-10
1   36358157    36358278    NM_001317122_cds_2_0_chr1_36358158_f    277.69  Sample-10
1   36358157    36358278    NM_012199_cds_2_0_chr1_36358158_f   277.69  Sample-10
1   36358173    36358278    NM_001317123_cds_2_0_chr1_36358174_f    278.95  Sample-10
1   36358697    36358879    NM_001317122_cds_3_0_chr1_36358698_f    283.55  Sample-10
...