Ecogenomics / BamM

Metagenomics-focused BAM file manipulation
http://ecogenomics.github.io/BamM/
GNU Lesser General Public License v3.0
16 stars 7 forks source link

tpmean gives somewhat unintuitive results when coverage is low #45

Open wwood opened 7 years ago

wwood commented 7 years ago

Since there's so many 0 values in the pileups, given a true coverage of 0.11 and an assumption that no read overlaps another, the tpmean coverage will be 0.01 / (0.8*length_of_contig), which is <<0.11 or even <<0.1 depending on contig length.

This could be fixed by, instead of chopping the top and bottom 10% off, by setting the top 10% of bases to the 90th percentile, and the bottom 10% to the 10th percentile.