DDAffinity-network
Description
This repo contains code for Predicting the changes in binding affinity of multiple point mutations using protein three-dimensional structure by Guanglei Yu, Qichang Zhao, Xuehua Bi and Jianxin Wang.
We proposed a ProteinMPNN-inspired $\Delta\Delta G$ predictor using 3D structure and 2D sequences of wildtype $\mathcal{WT}$ and mutant $\mathcal{MT}$ protein complex as input. The mutant structure is generated by BuildModel and Optimize module using FoldX 5.0.
- Clipped patches: when given $\mathcal{WT}$ and $\mathcal{MT}$, we clipped $\mathcal{WT}$ and $\mathcal{MT}$ into residue patches containing 256 residues respectively, which are the 256 nearest neighbors of mutant residues based on $C_{\beta}$ distances of inter-residues, including the mutant residues itself.
- Two-step additive Gaussian noising strategy: To improve the performance and generalization of DDAffinity, we implemented a two-step additive Gaussian noising strategy for the atomic coordinates of residues. Firstly, the additive Gaussian noise ($std=0.2\mathring{\mathrm A}$) was combined with all input atomic coordinates, which yields the perturbed backbone dihedrals $(\phi,\psi,\omega)$ and sidechain dihedrals $(\chi^{(1)},\chi^{(2)},\chi^{(3)},\chi^{(4)})$. Secondly, inspired by the ideas of ProteinMPNN that can improve predictive performance and make prediction algorithm more robust, we also incorporate Gaussian noise ($std=0.2\mathring{\mathrm A}$) to the atomic coordinates of protein backbone atom set $\boldsymbol{A}={N,C\alpha,C,O,C\beta}$. Importantly, this perturbation was implemented without updating the backbone dihedrals and sidechain dihedrals. Additionally, we only implemented above mentioned two-step additive Gaussian noising strategy during training.
- How to construct the $k$-nearest neighbor graph. We use three different neighbor residues: (1) Spatial distance $k_1$. A residue will be connected to its $k_1$-nearest neighbors according to their spatial Euclidean distances, which ensures that the spatial densities of different proteins are comparable. (2) Sequential distance $k_2$. The linear interactions of residues are defined as the sequential distance between the residue $r_i$ and its sequence neighbors if their sequential distances are no more than $(k_2-1)/2$. (3) Long-range distance $k_3$. For efficiently capturing those dependencies that are long-range in sequence but local in 3D Euclidean space, neighbors of residue $r_i$ are ranked in ascending order according to their Euclidean distances, and discarded if their sequence distances are not greater than $(k_2-1)/2$. After that, we select the $k_3$-nearest neighbors from the ordered neighbor list. In summary, $k=k_1+k_2+k_3$.
Overview of our DDAffinity architecture is shown below.
Install
DDAffinity Environment
conda env create -f env.yml -n DDAffinity
conda activate DDAffinity
The default PyTorch version is 1.12.1 and cudatoolkit version is 11.3. They can be changed in env.yml
.
Preparation of processed dataset
We generated all protein mutant complex PDB data and wild-type complex PDB data from PDBs file data/SKEMPI2/PDBs, rde/datasets/PDB_generate.py, data/SKEMPI2/SKEMPI2.csv, and FoldX tool. Then we use rde/datasets/skempi_parallel.py to transform the PDB files of wild-type and mutant complexes into processed dataset SKEMPI2_cache.
python PDB_generate.py
python skempi_parallel.py --reset
Datasets
Trained Weights
The overall SKEMPI2 trained weights is located in:
DDAffinity
The M1340 trained weights is located in:
M1340
Usage
Evaluate DDAffinity
python test_DDAffinity.py ./configs/train/mpnn_ddg.yml --device cuda:0
Blind testing: non-redundant blind testing on the multiple point mutation dataset M595
python case_study.py ./configs/inference/blind_testing.yml --device cuda:0
Case Study 1: Predict Mutation Effects for SARS-CoV-2 RBD
python case_study.py ./configs/inference/case_study_1.yml --device cuda:0
Case Study 2: Human Antibody Optimization
python case_study.py ./configs/inference/case_study_2.yml --device cuda:0
Train DDAffinity
python train_DDAffinity.py ./configs/train/mpnn_ddg.yml --num_cvfolds 10 --device cuda:0
Acknowledgements
We acknowledge that parts of our code is adapted from Rotamer Density Estimator (RDE). Thanks to the authors for sharing their codes.