gnina / models

Trained caffe models
82 stars 23 forks source link

Some potentially problematic CrossDocked examples #18

Closed hnisonoff closed 2 years ago

hnisonoff commented 2 years ago

I've noticed that in some of the CrossDocked folders there are some possible errors data processing.

In 1433Z_HUMAN_1_244_pep_0 there is a docked file 5d3f_A_rec_5d3f_fsc_lig_tt_docked.sdf.gz. This seems to suggest the ligand 5d3f_fsc is being docked into the receptor 5d3f_A_rec.pdb and I think this part is correct. However, if you load the crystal ligand 5d3f_fsc_lig.pdb, it is in a non-physically possible part of the receptor. I think this is because there is also a 5d3f_B_rec.pdb file that this ligand presumably was taken from. In summary, I think there needs to be two copies of 5d3f_fsc_lig.pdb, one for receptor chain A and one for receptor chain B.

francoep commented 2 years ago

You are correct about this issue, and mostly correct about the reasoning. What actually happened is that this protein is a homo dimer of 2 chains, one of which has the ligand on it. when aligning the chain without the ligand on it to the reference pocket, it re-oriented the other chain to be in the way/incorrect location. This is issue was known to me, and is part of the next version of the dataset that I am working on.

hnisonoff commented 2 years ago

Great, thanks for the explanation and for continuing to work on this dataset!