BioinfoMachineLearning / CDPred

Deep transformer for predicting interchain residue-residue distances of protein complexes
MIT License
12 stars 3 forks source link

Questions about test datasets #4

Closed LittletreeZou closed 1 year ago

LittletreeZou commented 1 year ago

Hi, thank you for this interesting work. About the four test datasets (HomoTest1, HomoTest2, HeteroTest1, HeteroTest2), in the true_pdb folder, I couldn't find their corresponding pdb files but only fasta and h_dist files. My questions are:

  1. What does h_dist mean? Are they the ground_truth distance maps of dimers?
  2. Do you have the ground truth pdb files corresponding to these test dimers? Or could you share the link for me to download these pdb files?

Thank you!

ZhiYeG commented 1 year ago

Hi, thank you for your interest in our work. The h_dist is the heavy atom distance map, you can find the definition in the paper. Also, if you want to download the ground truth pdb file of our four test datasets, please check the link: https://zenodo.org/record/6647564. And if there are no ground truth pdb file in the zenodo share link, that means these targets are owned by the CASP committee, we have no right to publish them.

LittletreeZou commented 1 year ago

I see. Thank you for clarifying it.

LittletreeZou commented 1 year ago

I'm manually collecting the pdbids for HomoTest1, HomoTest2, and HeteroTest1. I could find most of them on the CASP website except some samples as listed below:

I would be highly appreciated if you could share the way that you obtain structures of these samples.

ZhiYeG commented 1 year ago

I'm sorry that I can't share these structures with you since I didn't have the right to do so. Even CASP officials have no right to release these targets for some reasons, i.e., some target is created by experimental and has yet to be published.