To set up your environment to run the code, install the following packages:
Install python 3.8.16 using conda
with conda install:
pytorch==2.2 (see in pytorch official website instructions).
pymol-open-source==2.5.0
Install GVP-GNN:
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv torch_geometric -f https://data.pyg.org/whl/torch-2.2.0+cu121.html
follow instructions here.
git clone https://github.com/drorlab/gvp-pytorch.git
cd gvp-pytorch
pip install -e .
Finally the rest of dependencies (copy it in a requirements.txt and run pip install -r requirements.txt):
matplotlib==3.7.2
networkx==3.1
numpy==1.24.3
pandas==2.0.3
rdkit==2022.09.5
scikit_learn==1.3.0
scipy==1.5.2
seaborn==0.12.2
tqdm==4.63.0
biopython==1.78
We provide a toy dataset to demonstrate the training and testing of our model.
Dataset structure:
data/
toy_set/
ligand/
ligand_1.sdf
ligand_2.sdf
...
protein/
protein_1.pdb
protein_2.pdb
...
CSV file format:
pdb,affinity
3uri,9
4m0z,5.19
4kz6,3.1
4jxs,4.74
2r9w,5.1
...
Preprocessing steps:
Run the preprocessing script: python preprocessing.py
Prepare the dataset: python dataset_ConBAP.py
Contrastive Learning with Redocked 2020 Dataset:
python pretrain.py
.Fine-Tuning with PDBbind Dataset:
./unsupervised/model
.python train_ConBAP.py
(The processed data sets are available at here.)
(Note: Modify file paths based on your directory structure.)./supervised/model
.python predict_single.py
.python casf_docking_single.py
casf_screening_single.py
.predict.py
.(Note: Modify file paths based on your directory structure.)