bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
88 stars 18 forks source link

Lineage clustering - how to evaluate results #192

Closed andreaniml closed 2 years ago

andreaniml commented 2 years ago

Hi! I'm sorry if this is a silly question

I'm dealing with closely related Salmonella (~3 different STs) and I am trying to fit a model to get an idea of the structure in my dataset (and possibly filter out some samples). I am trying to use the lineage mode using --rank 1,2,3,4,5,6 and my lineages seem to make sense (although there are some polyphyletic), however I'm unsure on how to know if I can trust this clusters or not.

From the documentation I get that the Score values are not that informative, but mine are really low, is this a problem?

My scores:

Rank Score
1 0.0071
2 0.0325
3                          0.0624
0.0906
5 0.1122
6 0.1380

Many thanks!

johnlees commented 2 years ago

The scores really don't mean anything in this mode, so I wouldn't worry about it.

Looking at the clusters on a tree to see if you're happy with them, as it sounds like you are already doing, is a good idea. Minimum spanning trees can also sometimes be useful: https://poppunk.readthedocs.io/en/latest/mst.html

andreaniml commented 2 years ago

Thanks for the reply! The MST tree was useful too, thank you very much for the tip!