Kuhlman-Lab / ThermoMPNN

GNN trained to predict changes in thermodynamic stability for protein point mutants
MIT License
117 stars 18 forks source link

FileNotFoundError: 4_fireprotDB_bestpH.csv #13

Closed karishmathakrar closed 8 months ago

karishmathakrar commented 8 months ago

Receiving this error when running train_thermompnn.py file - please advise:

FileNotFoundError: [Errno 2] No such file or directory: '/nas/longleaf/home/dieckhau/protein-stability/enzyme-stability/data/fireprot/4_fireprotDB_bestpH.csv'

hdieckhaus commented 8 months ago

Hello,

What data are you trying to use to train your new model? As described in the README,

The main training script is train_thermompnn.py. To set up a training run, you must write a config.yaml file (example provided) to specify model hyperparameters. You also must provide a local.yaml file to tell ThermoMPNN where to find your data. These files serve as experiment logs as well.

This error is happening because ThermoMPNN is trying to train on a dataset that you do not have installed. ThermoMPNN does not come with any datasets preprocessed for training. You can find these datasets at this location for the Fireprot dataset and this location for the Megascale dataset mentioned in the paper. After downloading this data, you need to modify the config.yaml and local.yaml files to choose your dataset and direct ThermoMPNN to the correct file locations.

Hope this helps!

karishmathakrar commented 8 months ago

Hi, I was able to retrieve the information from those data sources for all but the following:

For reproducibility purposes, where can I look for that information?

Thank you!

hdieckhaus commented 8 months ago

Sure thing!