blab / cartography

Dimensionality reduction distills complex evolutionary relationships in seasonal influenza and SARS-CoV-2
https://doi.org/10.1101/2024.02.07.579374
MIT License
4 stars 1 forks source link

Run HDBSCAN directly on genetic distances and compare clusters to those from embeddings #119

Closed huddlej closed 1 month ago

huddlej commented 1 month ago

Description

Adds rules to all natural flu and SARS-CoV-2 workflows to apply HDBSCAN clustering to the genetic distance matrix that we use to produce the embeddings. We name this clustering "method" as "genetic" and include it in the grid search to find the optimal distance threshold per method for early H3N2 HA data. This PR updates tables, figures, and manuscript text to reflect the inclusion of these genetic distance clusters as a point of comparison to embedding clusters.

Development checklist

Related issues

Depends on https://github.com/blab/pathogen-embed/pull/33 Closes #99