Closed peter5842 closed 1 year ago
Input:
print(processed_complex["complex"])
g1 = processed_complex["graph1"]
g2 = processed_complex["graph2"]
dist_map = torch.cdist(g1.ndata["x"], g2.ndata["x"], p=2, compute_mode="donot_use_mm_for_euclid_dist")
examples = processed_complex['examples']
print((dist_map<=6).sum())
print((examples[:, 2] == 1).sum())
Output:
1ykp.pdb1
tensor(51)
tensor(381)
Oh, sorry for this question! I think I have another question: Is the postive label determined by the atom distance?
Hi, @peter5842. It depends on which dataset you are referring to. For DIPS-Plus and CASP-CAPRI datasets, these labels correspond to the labels generated using the bound
complex coordinates. For DB5-Plus, these labels correspond to the labels generated using the unbound
complex coordinates.
In general, a positive label is determined according to whether two heavy atoms (non-hydrogen atoms belonging to residues in different protein chains) are within 6 Angstrom of each other in the bound
version of the protein complex. We use logic from the following atom3-py3
library to generate these labels for our datasets (https://github.com/amorehead/atom3/blob/master/atom3/neighbors.py).
I believe one potential reason you are getting a different result from the original number of labels is that here, by using ndata[x]
, you are only computing the distances between pairs of Ca atoms, not between backbone or side chain atoms. I hope this helps.
I'm so grateful for your help with this. I think I am clear for the dataset.
Hi, @amorehead
When I reproduce this work, I have some questions about the dataset. I use the ndata["x"] of complex["graph1"] and complex["graph2"] to check the postive labels and distance, but I get some comfused result. The postive labels created by distance map(<6 Angstrom) are less than the complex["examples"]. So I want to know the ndata["x"] is the bound complex coordinates?
Thanks!