brentp / mosdepth

fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing
MIT License
695 stars 100 forks source link

Recieving low distribution values from exome data #129

Open ghosholivia opened 3 years ago

ghosholivia commented 3 years ago

Hi @brentp ,

For 14 exome samples with approx 3.3GB exome BAM each, I am receiving low distribution values of depths from mosdepth tool compared to other tools such as Qualimap. [On average: 26% low value from the other tool] I have used the parameter mentioned in your example: mosdepth --by capture.bed sample-output sample.exome.bam For both the tool, capture.bed = 12MB

Can you guide me on why there's much difference in output?

Thanks, Olivia

brentp commented 3 years ago

Hi Olivia, what file are you looking at that's showing low distribution values?

brentp commented 3 years ago

I can likely help if you can start by answering this question ^

ghosholivia commented 3 years ago

Hi Olivia, what file are you looking at that's showing low distribution values?

Thanks for your reply. @brentp Those are hiseq-2000 human exome data of ~120 million reads.

Thanks, Olivia

brentp commented 3 years ago

I mean what mosdepth file

ghosholivia commented 3 years ago

I mean what mosdepth file

The file “output.mosdepth.regions.dist.txt”. I ran the python script (plot.dist.py) on this file to get the mean coverage.

brentp commented 3 years ago

ok. that's the right file. now can you clarify what you saw that was unexpected? by that, i mean, can you expand on this

I am receiving low distribution values of depths from mosdepth tool compared to other tools such as Qualimap. [On average: 26% low value from the other tool]

what value are you looking at from mosdepth and how did you get 26% and what is "low"?

ghosholivia commented 3 years ago

Yes. What I meant to say is the mean coverage I’m getting from mosdepth is ~26% lower than the value from Qualimap.

Whereas in WGS depth output, values are not varying as such.

I’m attaching one screenshot of the excel sheet with the differences mentioned above.

0E8A9240-1395-4E70-AD10-B598A37EFB44

brentp commented 3 years ago

ah. ok. the coverage from mosdepth is 26% less than from qualimap. I don't know how qualimap works, but you could try running mosdepth with --fast-mode and see if the numbers match more closely.

If they do, that means that qualimap does not adjust for overlapping read-pairs (mosdepth does by default).

ghosholivia commented 3 years ago

ah. ok. the coverage from mosdepth is 26% less than from qualimap. I don't know how qualimap works, but you could try running mosdepth with --fast-mode and see if the numbers match more closely.

If they do, that means that qualimap does not adjust for overlapping read-pairs (mosdepth does by default).

Okay sure. I’ll try that parameter and check. Thanks a lot.

aspitaleri commented 3 years ago

ah. ok. the coverage from mosdepth is 26% less than from qualimap. I don't know how qualimap works, but you could try running mosdepth with --fast-mode and see if the numbers match more closely.

If they do, that means that qualimap does not adjust for overlapping read-pairs (mosdepth does by default).

Hi first of all great tool! I discovered it and it is very Swiss knife! I found me too the same behavior in comparison with samtools and pileup.sh (from BBMap) and --fast-mod "fix" the discrepancy indeed. Wondering which value is correct to be considered, with or without overlapping corrections? Thanks

brentp commented 3 years ago

Depends what you mean by "correct". If you use --fast-mode then you are double-counting overlapping reads from the same fragment. So you got one piece of information (the fragment) and you counted it twice wherever the reads overlapped.