Arcadia-Science / 2024-peptigate-evaluation

Evaluating peptide predictions made by the peptigate pipeline using orthogonal data
MIT License
0 stars 0 forks source link

Try clustering (PCA, UMAP, tSNE) with a larger set of reference peptides #2

Open taylorreiter opened 5 months ago

taylorreiter commented 5 months ago

In the human analysis, I tried to cluster peptides based on characteristics (aliphatic index, length, etc.). While things like PCA explain a lot of the variance, the peptides all form a big blob no matter how I mess with parameters in UMAP or tSNE. Both @elizabethmcd and @ecpierce had the idea to cluster the peptide predictions with known data that has labels. This would probably be clusters in peptipedia (e.g. all with the same bioactivity) or all of peptipedia or something. This would be fun to try with human data or with some other gold standard data set.

taylorreiter commented 1 month ago

As suggested in #8, remember to set the alpha on these plots so as not to obscure some of the points.