OATML-Markslab / ProteinGym

Official repository for the ProteinGym benchmarks
MIT License
189 stars 20 forks source link

Possible missing data of benchmarking supervised performance #36

Open Eikor opened 1 month ago

Eikor commented 1 month ago

Hi, thank you for such fundamental work for the computational bio community!

I am interested in evaluating the model's performance in a supervised setting. I have download the DMS_supervised_substitutions_scores.csv and run the script provide in scripts/scoring_DMS_supervised/performance_substitutions.sh, I modified the --input_scoring_file as DMS_supervised_substitutions_scores.csv and --DMS_reference_file_path as DMS_substitutions.csv. However, there is an error occurred in https://github.com/OATML-Markslab/ProteinGym/blob/495cc305135767b53478dda1e12039c30d7f82ce/proteingym/performance_DMS_supervised_benchmarks.py#L53, I wonder am I passing the wrong reference file or this script needs to be updated with current reference file?

Thanks!

pascalnotin commented 3 weeks ago

Hi @Eikor - thank you for your question!

The error you are experiencing is due to a change in naming conventions for certain of the files (also discussed here), and we were indeed using a different reference file (similar to this one but with a mapping to the old IDs as well) when running this particular script internally. We will fix these issues in a new release as soon as possible. In the meantime, you may use the performance files here which are the ones you would obtain by running this script.

Kind regards, Pascal