estimate_1n_coverage_1d_subsets averages over two distinct high peaks

About your genome

The tetraploid Sacharomyces (SRR3265401)

The AB/AABB, AAB and AAAB subset have one major peak each and they end up being estimated to be

30.4 2236.5
20.2 1431.4
14.6 3077.4

The 20.2 is simply a messup, but 30.4 and 14.6 are a mistake by denominator - AB is dividing by the coverage AABB smudge (thinking it's AB) which leads to doubling of the AB/AABB coverage estimate; 14.6 is close to truth because the first peak indeed is the AAAB.

Weight mean ends with an estimate in between of two possible interpretations ~20. Which is a bit unfortunate, perhaps a weighted median would do a better justice. But this should be tested with many many genomes.

KamilSJaron / smudgeplot

estimate_1n_coverage_1d_subsets averages over two distinct high peaks #123