ak422/DDAffinity - Githubissues

DDAffinity-network

Description

This repo contains code for Predicting the changes in binding affinity of multiple point mutations using protein three-dimensional structure by Guanglei Yu, Qichang Zhao, Xuehua Bi and Jianxin Wang.

We proposed a ProteinMPNN-inspired $\Delta\Delta G$ predictor using 3D structure and 2D sequences of wildtype $\mathcal{WT}$ and mutant $\mathcal{MT}$ protein complex as input. The mutant structure is generated by BuildModel and Optimize module using FoldX 5.0.

Clipped patches: when given $\mathcal{WT}$ and $\mathcal{MT}$, we clipped $\mathcal{WT}$ and $\mathcal{MT}$ into residue patches containing 256 residues respectively, which are the 256 nearest neighbors of mutant residues based on $C_{\beta}$ distances of inter-residues, including the mutant residues itself.
Two-step additive Gaussian noising strategy: To improve the performance and generalization of DDAffinity, we implemented a two-step additive Gaussian noising strategy for the atomic coordinates of residues. Firstly, the additive Gaussian noise ($std=0.2\mathring{\mathrm A}$) was combined with all input atomic coordinates, which yields the perturbed backbone dihedrals $(\phi,\psi,\omega)$ and sidechain dihedrals $(\chi^{(1)},\chi^{(2)},\chi^{(3)},\chi^{(4)})$. Secondly, inspired by the ideas of ProteinMPNN that can improve predictive performance and make prediction algorithm more robust, we also incorporate Gaussian noise ($std=0.2\mathring{\mathrm A}$) to the atomic coordinates of protein backbone atom set $\boldsymbol{A}={N,C\alpha,C,O,C\beta}$. Importantly, this perturbation was implemented without updating the backbone dihedrals and sidechain dihedrals. Additionally, we only implemented above mentioned two-step additive Gaussian noising strategy during training.
How to construct the $k$-nearest neighbor graph. We use three different neighbor residues: (1) Spatial distance $k_1$. A residue will be connected to its $k_1$-nearest neighbors according to their spatial Euclidean distances, which ensures that the spatial densities of different proteins are comparable. (2) Sequential distance $k_2$. The linear interactions of residues are defined as the sequential distance between the residue $r_i$ and its sequence neighbors if their sequential distances are no more than $(k_2-1)/2$. (3) Long-range distance $k_3$. For efficiently capturing those dependencies that are long-range in sequence but local in 3D Euclidean space, neighbors of residue $r_i$ are ranked in ascending order according to their Euclidean distances, and discarded if their sequence distances are not greater than $(k_2-1)/2$. After that, we select the $k_3$-nearest neighbors from the ordered neighbor list. In summary, $k=k_1+k_2+k_3$.

Overview of our DDAffinity architecture is shown below.

Install

DDAffinity Environment

conda env create -f env.yml -n DDAffinity
conda activate DDAffinity

The default PyTorch version is 1.12.1 and cudatoolkit version is 11.3. They can be changed in env.yml.

Preparation of processed dataset

We generated all protein mutant complex PDB data and wild-type complex PDB data from PDBs file data/SKEMPI2/PDBs, rde/datasets/PDB_generate.py, data/SKEMPI2/SKEMPI2.csv, and FoldX tool. Then we use rde/datasets/skempi_parallel.py to transform the PDB files of wild-type and mutant complexes into processed dataset SKEMPI2_cache.

python PDB_generate.py 
python skempi_parallel.py --reset

Datasets

Dataset	Download Script	Processed Dataset
SKEMPI v2	`data/get_skempi_v2.sh`	`data/SKEMPI2/SKEMPI2_cache`
SKEMPI2.csv	—	SKEMPI2_cache
M1707.csv	—	M1707_cache
S1131.csv	—	S1131_cache
M1340.csv	—	M1340_cache
M595.csv	—	M595_cache
S494.csv	—	S494_cache
S285.csv	—	S285_cache
Ssys.csv	—	Ssys_cache

Trained Weights

The overall SKEMPI2 trained weights is located in: DDAffinity

The M1340 trained weights is located in: M1340

Usage

Evaluate DDAffinity

python test_DDAffinity.py ./configs/train/mpnn_ddg.yml --device cuda:0

Blind testing: non-redundant blind testing on the multiple point mutation dataset M595

python case_study.py ./configs/inference/blind_testing.yml --device cuda:0

Case Study 1: Predict Mutation Effects for SARS-CoV-2 RBD

python case_study.py ./configs/inference/case_study_1.yml --device cuda:0

Case Study 2: Human Antibody Optimization

python case_study.py ./configs/inference/case_study_2.yml --device cuda:0

Train DDAffinity

python train_DDAffinity.py ./configs/train/mpnn_ddg.yml --num_cvfolds 10 --device cuda:0

Acknowledgements

We acknowledge that parts of our code is adapted from Rotamer Density Estimator (RDE). Thanks to the authors for sharing their codes.