ESMBind (preprint) is a deep learning and physics-based workflow for predicting metal-binding proteins and modeling their 3D structures with bound metal ions. It combines evolutionary scale modeling (ESM) for residue-level metal binding prediction with physics-based protein-metal modeling to generate detailed 3D structures of protein-metal complexes.
We recommend using mamba to manage dependencies.
ESMBind consists of two main components:
To use the pre-trained model for inference:
python multi_modal_binding/inference.py --config multi_modal_binding/configs/inference.json
We have provided pretrained model weights in the multi_modal_binding/model/trained_weights
directory. The inference script will automatically load these weights for prediction.
After running the inference script, the predictions will be saved in a pickle file in the multi_modal_binding/results/
directory, which will be used in the 3D modeling step.
To train the model from scratch:
python multi_modal_binding/train.py --config multi_modal_binding/configs/training.json
Both the inference and training scripts assume that embeddings from ESM and ESM-IF are precomputed. The multi_modal_binding/datasets
directory contains a sample dataset that can be used for both training and inference. Labels are not required for inference; any standard FASTA file will work.
We provide two example scripts, get_esm_embedding.py
and get_esm_if_embedding.py
, to generate embeddings for the input sequences and structures. Please refer to the official ESM repository for more details.
After obtaining residue-level predictions from the deep learning model, follow these steps to generate 3D structures with placed metal ions:
Convert probability predictions to binding residues:
python 3D_modeling/src/parse_dl_results.py path/to/predictions_file.pkl ION_TYPE --lower_factor 0.5
Replace path/to/predictions_file.pkl
with the path to your predictions file, ION_TYPE
with the type of ion you're analyzing (e.g., CA, ZN, MG, etc.), and adjust the lower_factor
as needed.
Generate 3D structures:
cd 3D_modeling
bash run-3d-modeling.sh
Please revise the run-3d-modeling.sh
script to specify the input and output directories before running. This script processes the parsed deep learning predictions, places metal ions, and performs energy minimization to produce the final 3D structures.
This project uses data from:
This source code is licensed under the CSI approved 3-clause BSD license found in the LICENSE file in the root directory of this source tree.
If you find our work useful, please cite our work as:
@article {ESMBind,
author = {Dai, Xin and Henderson, Max and Yoo, Shinjae and Liu, Qun},
title = {Predict metal-binding proteins and structures through integration of evolutionary-scale and physics-based modeling},
elocation-id = {2024.08.09.607368},
year = {2024},
doi = {10.1101/2024.08.09.607368},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2024/08/10/2024.08.09.607368},
journal = {bioRxiv}
}
For questions or issues, please open an issue on GitHub or contact Xin Dai (xdai@bnl.gov).