glstott / PMeND

Phylogeny and Metadata Network Database
2 stars 0 forks source link

Cluster Stability Measures #12

Closed glstott closed 2 years ago

glstott commented 2 years ago

"1) the proportion of sequences that moved from a cluster in the preceding week to non-clustered in the current week, 2) the number of clusters defined in the previous week that split in the current week (i.e., any instance where sequences that were in a single cluster in the previous week have moved to different clusters in the current week), and 3) the overall entropy score of the clusters found in the current week (with the lowest score of 0 occurring when all sequences are in a single cluster). " - Sobkowiak, et al. medRxiv preprint doi: https://doi.org/10.1101/2022.03.10.22272213

Assuming clusters are labeled, no problem. Option 1 can be done by a quick match filtering on tree and clade. Option 2 is similar, we just need to find clusters in one week that split to two in a subsequent week. Option 3 is unclear, but I suspect they're using a standard entropy measure a la https://stats.stackexchange.com/questions/338719/calculating-clusters-entropy-python .

glstott commented 2 years ago

Option 1:

glstott commented 2 years ago

Option 2:

glstott commented 2 years ago

Next steps: