dauparas / ProteinMPNN

Code for the ProteinMPNN paper
MIT License
934 stars 284 forks source link

mmseq clustering logic #52

Open OliviaViessmann opened 1 year ago

OliviaViessmann commented 1 year ago

Hi,

thank you for providing the proteinMPNN code. I am interested in trying out variations of the clustering thresholds, is it possible for you to share the code that produces the train/test/validation clusters used for multi-chain training, i.e how could I generate my own version of list.csv, train_clusters.txt and test_clusters.txt?

Thanks a lot in advance! Olivia

jadolfbr commented 1 year ago

I would also like to do something similar and I could not find the shell script/code used to do the clustering nor detailed information in the supplemental. Thanks.

universvm commented 11 months ago

I have the same issue. @dauparas I wonder if you could have a look here :)

anar-rzayev commented 3 weeks ago

@OliviaViessmann I was wondering if you figured it out or not.

universvm commented 3 weeks ago

@OliviaViessmann I was wondering if you figured it out or not.

Hi all,

The authors were very cryptic when I asked them by email as it seems like they have not kept any of the code for doing this.

However, based on their description I had scripted these some time ago. They may need some tweaking but still better than starting from scratch.

MMSeq Scripts: https://github.com/wells-wood-research/ProteinMPNN_custom_training/blob/main/dataset/create_mmseqs.sh

Other necessary files: https://github.com/wells-wood-research/ProteinMPNN_custom_training/blob/main/dataset/create_clusters_and_csv.py

Bear in mind this was for this publication: https://academic.oup.com/peds/article/doi/10.1093/protein/gzae002/7591701 so if you use these scripts, please cite us!