Closed Guo-Stone closed 6 months ago
Hi,
I've ran into the same error too when clustering some specific protein groups. Unfortunately, I didn't manage to solve it even after trying different combinations of parameters used in Foldseek-Cluster. Empirically speaking, this is probably because proteins inside these groups look too similar to each other, i.e. their diversity is very low. Besides, it does not showcase a strong relationship to the total length of proteins. May I ask that whether the proteins you want to cluster look similar by manually checking?
If you still want to cluster these proteins, I suggest using MaxCluster as an alternative approach. If you just want to look at the similarity among them, you can just calculate their mutual TM-score and visualize it through something like a heat map.
My structures to be clustered look different to others. There must be some bugs in FoldSeek when dealing with specific structures. But I find a trick to make it work at this case: Given a group of structures with length <= 50, you can add a huge protein, called 'big-bro protein', into your structure set. After that, the FoldSeek can work well. It's easy to choose a big-bro protein, only to ensure it is divided into a single cluster.
Thanks for providing this trick! I've run a test and this strategy indeed works. It should be of great help under certain circumstances.
When I cluster the structure with length less than 50 residues, the Foldseek gets error:
Do you know how to solve it and how long the minimum sequence is?