Compare MDS in flu with 2 vs 4 components in grid search

MDS with 2 components produces noticeably different embeddings and HDBSCAN clusters than MDS with 4 components.

For example, this is the seasonal flu MDS embedding with 2 components:

EBA7DA05-06E2-4F96-9DDD-259F16B167F4

And this is the MDS embedding for the same data with 4 components (only first two are shown, but note differences in clusters):

5BA72D38-24AC-4A9A-8603-A717357729A7

The correlation between genetic distance and embedding distance is much higher for the 4-component embedding (as we would expected), but we don't know if this embedding produces more accurate clusters.

We should update the grid search parameters file for seasonal flu's training data to include a column for MDS's n_components and re-run the grid search with these different values. We should update the script to summarize grid search results to identify the optimal number of components for MDS from validation MCC like we do for t-SNE and UMAP parameters. Then we should re-run the full MDS embedding with the optimal values and update the manuscript accordingly.

blab / cartography

Compare MDS in flu with 2 vs 4 components in grid search #11