which pdb files are used in CASP-CAPRI test dataset?

BioinfoMachineLearning / DeepInteract

A geometric deep learning pipeline for predicting protein interface contacts. (ICLR 2022)

https://zenodo.org/record/6671582

GNU General Public License v3.0

62 stars 11 forks source link

which pdb files are used in CASP-CAPRI test dataset? #8

Closed onlyonewater closed 2 years ago

onlyonewater commented 2 years ago

hi, can you provide the original 19 pdb files for CASP-CAPRI 13 & 14 dataset?

amorehead commented 2 years ago

Hi, @onlyonewater. Absolutely. I believe the ZIP archive below should contain the chain splits we ultimately used in this work. If you have any other questions, let me know. Thanks for expressing interest in our work!

CASP-CAPRI13-14_Chain_Splits.zip

onlyonewater commented 2 years ago

hi, @amorehead I get it, thanks!!!

onlyonewater commented 2 years ago

which the difference between 5w6l_r_u.pdb and 5w6l_l_u.pdb files?

amorehead commented 2 years ago

Hi, @onlyonewater. These files simply follow a naming scheme similar to that in the Docking Benchmark 5 dataset. For example, 5w6l_r_u.pdb here would represent the structure of a receptor (e.g., the right chain), while 5w6l_l_u.pdb would represent the structure of the ligand (e.g., the left chain). The _u in each filename would just represent whether the structure is the unbound or bound version.

onlyonewater commented 2 years ago

so, should I calculate the contact map for these two pdb files separately?

amorehead commented 2 years ago

@onlyonewater, the intended use case for these two PDB files is to actually predict a single contact map that represents the inter-chain interactions between their residues. Our inference pipeline should support you specifying two separate PDB files as input, and the output contact map (as a NumPy array) will represent the predicted inter-chain contact probabilities for each pair of residues.

onlyonewater commented 2 years ago

oh, I get it, thanks!!