boun-tabi / biochemical-lms-for-drug-design

Code for the paper "Exploiting Pretrained Biochemical Language Models for Targeted Drug Design", to appear in Bioinformatics, Proceedings of ECCB2022.
MIT License
17 stars 2 forks source link

Docking #4

Closed toooooodo closed 1 year ago

toooooodo commented 1 year ago

In regards to the src/run_docking code responsible for running molecular docking, I have come across a concept that I find a bit perplexing. It appears that a Uniprot_ID is associated with both a ligand and a receptor, which has raised some questions for me.

From what I understand, a Uniprot_ID typically corresponds to a protein, which then serves as a receptor in molecular docking. However, I'm curious about the inclusion of the term "ligand." Could you please clarify the relationship between a Uniprot_ID, a ligand, and a receptor in this context? Additionally, could you point me to the source where I can find these correspondences, as mentioned in the link?

Furthermore, I noticed in the code here that PyMOL is used to remove the ligand from the PDB file. This processed file, named {target}-receptor.pdb, is then used as a receptor for scoring against a molecule's SDF file. Could you elaborate on the necessity of removing the ligand from the obtained PDB file before conducting docking with the small molecule for scoring?

I appreciate your assistance in clarifying these points. Thank you for your time and expertise.

gokceuludogan commented 1 year ago

Thank you for reaching out with your inquiries.

For clarity, for each test target protein, we choose a 3D structure in complex with a ligand. You can access these structures at RCSB PDB. When a particular PDB ID is given, our code fetches it automatically via an API, as shown here. The ligands extracted from these structures help define the binding site for the docking procedures. It's important to note that these ligands are associated with the structures, not necessarily with Uniprot IDs. We have incorporated them manually in the code, which allows for their removal from the corresponding structure at a later stage.

toooooodo commented 1 year ago

Thank you for your prompt response. I appreciate the clarity you provided. I have a couple of additional questions:

  1. I observed that when using the UniprotID on the RCSB PDB to search for proteins, multiple complexes are returned as results. Could you kindly explain how you determine which specific complex serves as the basis for the protein 3D structure and pocket location in your work?

  2. Is there a method or resource available to access 3D structural information for each protein-molecule interaction pair within the training set?

Your assistance in addressing these inquiries would be greatly appreciated. Thank you in advance :)

gokceuludogan commented 1 year ago

For your first question, when we encounter multiple complexes for a given UniprotID on RCSB PDB, our main criterion is that the complex should contain a ligand. If multiple complexes meet this criterion, we select one arbitrarily.

Regarding your second question, to my knowledge, there isn't a dedicated resource for 3D structural data for every protein-molecule interaction in the training set. However, BindingDB does link to some PDB IDs for specific protein-ligand interactions. You might consider scripting a download of these or exploring PDBbind to see if your desired structures are included.

Hope this helps, and please let me know if you need further clarification!

toooooodo commented 1 year ago

Thank you for your reply, this answers my question :)