Immortals-33 / Scaffold-Lab

A comprehensive benchmark on the performances of multiple protein backbone generative models.
MIT License
30 stars 3 forks source link

Error: Prefilter step 0 died #1

Closed Guo-Stone closed 6 months ago

Guo-Stone commented 6 months ago

When I cluster the structure with length less than 50 residues, the Foldseek gets error:

No k-mer could be extracted for the database /home/guotao/code/ProteinMPNN-main/tmp_foldseek//12264118459229945720/clu_tmp/2042991015947385700/input_step_redundancy_ss.
Maybe the sequences length is less than 14 residues.
Error: Prefilter step 0 died
Error: Search died

Do you know how to solve it and how long the minimum sequence is?

Immortals-33 commented 6 months ago

Hi,

I've ran into the same error too when clustering some specific protein groups. Unfortunately, I didn't manage to solve it even after trying different combinations of parameters used in Foldseek-Cluster. Empirically speaking, this is probably because proteins inside these groups look too similar to each other, i.e. their diversity is very low. Besides, it does not showcase a strong relationship to the total length of proteins. May I ask that whether the proteins you want to cluster look similar by manually checking?

If you still want to cluster these proteins, I suggest using MaxCluster as an alternative approach. If you just want to look at the similarity among them, you can just calculate their mutual TM-score and visualize it through something like a heat map.

Guo-Stone commented 6 months ago

My structures to be clustered look different to others. There must be some bugs in FoldSeek when dealing with specific structures. But I find a trick to make it work at this case: Given a group of structures with length <= 50, you can add a huge protein, called 'big-bro protein', into your structure set. After that, the FoldSeek can work well. It's easy to choose a big-bro protein, only to ensure it is divided into a single cluster.

Immortals-33 commented 6 months ago

Thanks for providing this trick! I've run a test and this strategy indeed works. It should be of great help under certain circumstances.