bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
89 stars 18 forks source link

Improve distance estimation #130

Open nickjcroucher opened 3 years ago

nickjcroucher commented 3 years ago

It may be worth testing whether corrections to the pairwise distances could be employed as described in https://gitlab.pasteur.fr/GIPhy/JolyTree, as we are now observing core Hamming distances >0.1; as an initial test, it could be employed in the tree visualisation, but may even be helpful in resolving within strain/between strain distances.

johnlees commented 3 years ago

Comparing equation 3 in the paper to the current Monte Carlo method and/or using this method for drawing the tree?

nickjcroucher commented 3 years ago

Sorry if I was unclear - rather than correcting for false positive matches, it would be a correction for multiple substitutions occurring at the same site - this should not change small distances, but would increase larger distances (which are systematically underestimated), which might help separate within & between strain distances where Hamming distances are large. But at a simpler level, it might improve the phylogenies.