the results of DMS fitness

jiaolifengmi commented 2 months ago

Thank you very much for your excellent work. Regarding "substitution DMS", I noticed that the dataset you provided contains a total of 217 groups of proteins, but in some work it was mentioned that using your benchmarks, only 63 groups of proteins were used. Their answers It means that because your benchmarks did only have 63 groups in the initial version 0.1, then the performance indicators of "substitution DMS" reported in your paper "ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design" are Which version was experimentally obtained?

pascalnotin commented 2 months ago

Dear @jiaolifengmi,

The results from our paper "ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design" are based on the ProteinGym v1.0 datasets, which includes 217 distinct substitution DMS (across 187 distinct protein families / Uniprot IDs). Differences between different version of the benchmark (including the v0.1 datasets from the Tranception paper) are described in Appendix A.3.1. from the ProteinGym paper (https://papers.nips.cc/paper_files/paper/2023/hash/cac723e5ff29f65e3fcbb0739ae91bee-Abstract-Datasets_and_Benchmarks.html).

Kind regards, Pascal

pascalnotin commented 1 month ago

Hi @jiaolifengmi - closing this issue as I believe it is addressed by the above, but feel free to re-open if needed. Best, Pascal

OATML-Markslab / ProteinGym

the results of DMS fitness #45