Closed xnought closed 7 months ago
Just to bring this into discussion, why would we need to cluster proteins?
First, it would be helpful to filter by something like similar structure. If we had cluster groups, we could filter and narrow down the search. For example, if one cluster was a ring structure, we could search by that and find tons of interesting proteins.
Another reason is that we don't know the function of many of our proteins (if not all). So clustering them into groups where proteins in those clusters are known (like from protein data bank), we could predict their function.
We could also display a view for clustering that helps people spatially find similar proteins. I am thinking of embeddings all the proteins in 2D then coloring by clusters. Then we could even overlay the predicted function on top.
Piggbacking on that. We could annotate these large visualizations over time with clusters or edits. So at some point we have a global map of proteins.
That would be a superpower when exploring proteins
super good idea i'm liking to allow people to do their own clustering with our proteins versuses other. We could for example instead filter by foldseek:cluster-name or some other clusering method someone cam up with with k-means-alphafold-embeddings:cluster-name. And have articles for each cluster and clustering method.
Hit an issue where the foldseek external databases can be only used for search. So will need to download all of pdb
cluster our proteins with the protein data bank or uniprot proteins so we can find similar groups. We can also cluster our protein with other proteins in our venome. When we get these clusters, we can search/filter by them too.