Closed karishmathakrar closed 8 months ago
Hello,
What data are you trying to use to train your new model? As described in the README,
The main training script is train_thermompnn.py. To set up a training run, you must write a config.yaml file (example provided) to specify model hyperparameters. You also must provide a local.yaml file to tell ThermoMPNN where to find your data. These files serve as experiment logs as well.
This error is happening because ThermoMPNN is trying to train on a dataset that you do not have installed. ThermoMPNN does not come with any datasets preprocessed for training. You can find these datasets at this location for the Fireprot dataset and this location for the Megascale dataset mentioned in the paper. After downloading this data, you need to modify the config.yaml and local.yaml files to choose your dataset and direct ThermoMPNN to the correct file locations.
Hope this helps!
Hi, I was able to retrieve the information from those data sources for all but the following:
For reproducibility purposes, where can I look for that information?
Thank you!
Sure thing!
mega_splits.pkl
can be found in the ThermoMPNN GitHub in the ThermoMPNN/dataset_splits
folderAlphaFold_model_PDBs/
should be in the Megascale dataset at the link I provided, listed as: AlphaFold_model_PDBs.zip
monomers
was the name for my PDB folder for the FireProt dataset. You can just replace this with the path to the pdbs
folder in the FireProt repository that I provided.enzyme_stability_cache/
doesn't contain anything prior to running training. This is used to cache the file loading operations during training for speed/performance reasons. You can simply create an empty folder for this before starting training, then set this variable in the config file. You can also disable the cache mechanism by calling the parse_PDB
function directly instead of using parse_PDB_cached
, if desired.
Receiving this error when running train_thermompnn.py file - please advise:
FileNotFoundError: [Errno 2] No such file or directory: '/nas/longleaf/home/dieckhau/protein-stability/enzyme-stability/data/fireprot/4_fireprotDB_bestpH.csv'