In the human analysis, I tried to cluster peptides based on characteristics (aliphatic index, length, etc.). While things like PCA explain a lot of the variance, the peptides all form a big blob no matter how I mess with parameters in UMAP or tSNE. Both @elizabethmcd and @ecpierce had the idea to cluster the peptide predictions with known data that has labels. This would probably be clusters in peptipedia (e.g. all with the same bioactivity) or all of peptipedia or something. This would be fun to try with human data or with some other gold standard data set.
In the human analysis, I tried to cluster peptides based on characteristics (aliphatic index, length, etc.). While things like PCA explain a lot of the variance, the peptides all form a big blob no matter how I mess with parameters in UMAP or tSNE. Both @elizabethmcd and @ecpierce had the idea to cluster the peptide predictions with known data that has labels. This would probably be clusters in peptipedia (e.g. all with the same bioactivity) or all of peptipedia or something. This would be fun to try with human data or with some other gold standard data set.