ccsb-scripps / AutoDock-GPU

AutoDock for GPUs and other accelerators
https://ccsb.scripps.edu/autodock
GNU General Public License v2.0
366 stars 101 forks source link

How to find the RMSD_of_probable_global_minimum from Ligand 1.dlg output #257

Closed srilekha1993 closed 3 months ago

srilekha1993 commented 4 months ago

Hi, The ligand_properties.csv shows RMSD_of_probable_global_minimum as 1.55 for 5wlo protein. Below attachment shows the output docking table with one of the ligand for 5wlo protein

5wlo dlg_1

So can anyone help me out how to evaluate the RMSD value from above output

Thanks

srilekha1993 commented 4 months ago

@rwxayheee can you please help me out for above issue

rwxayheee commented 4 months ago

Hi @srilekha1993 I don't fully understand the issue, can you explain what you want me to help with?

Were you trying to run the benchmark calculations in AD-GPU_set_of_42? Are you referring to ligand_properties.csv

Can you share more details like the files and your commands? If your docked pose is 30 Å off from the reference, maybe you were docking on a different binding site or your reference ligand structure doesn't align with the receptor coordinates. Some visualization might help

srilekha1993 commented 4 months ago

@rwxayheee Thanks for your response.

Here are the complete details of my run: 1) Installation of Autodock GPU- I have followed the github repo https://github.com/ccsb-scripps/AutoDock-GPU README and installed with NUMWI=64 setting.

2) Dataset used for experiment are downloaded from https://zenodo.org/records/4031961 . These are the 140 protein-ligand complexes mentioned in the Autodock-GPU paper (https://pubs.acs.org/doi/10.1021/acs.jctc.0c01006). We run experiments on these complexes without making any changes to the dataset.

3) I ran autodock-gpu for 5wlo dataset using the following command(protein.maps.fld and rand-0.pdbqt files are present in the 5wlo directory) /home/ubuntu/AutoDock-GPU/bin/autodock_gpu_64wi --ffile ./protein.maps.fld --lfile ./rand-0.pdbqt --nrun 20

4) I get the following output on the screen where the command is run

Screenshot 2024-03-01 124423

From the above screenshot, we can see that the energy value is -20.12 kcal/mol. This appears to be approximately close to the energy value mentioned in ligand_properties.csv file that comes with the dataset. In the ligand_properties.csv, the energy score for 5wlo is -20.45 which is close to -20.12 that i have obtained from my run.

5) We get the following RMSD table in the output file(rand-0.dlg) from our run

required_rmsd

RMSD value for 5wlo in ligand_properties.csv is 1.55 A. As we can see from the above table the reference RMSD is ~30 A. We are not sure why the reference RMSD is so high for our docked pose when we didnot change any setting in the dataset. It would be great to have some clarification on this.

I am attaching the complete rand-0.dlg file for your reference. rand-0.txt

Thanks

rwxayheee commented 4 months ago

Hi @srilekha1993, thanks for the detailed description.

I ran autodock-gpu for 5wlo dataset using the following command(protein.maps.fld and rand-0.pdbqt files are present in the 5wlo directory) /home/ubuntu/AutoDock-GPU/bin/autodock_gpu_64wi --ffile ./protein.maps.fld --lfile ./rand-0.pdbqt --nrun 20

The ligand input file rand-0.pdbqt doesn't align with the receptor coordinates. Please see a picture:

Screenshot 2024-03-01 at 7 38 54 AM

Therefore, if the input coordinate of ligand was used to compute the Referece RMSD (since no alternate reference was provided?) it will produce you a large RMSD even if the ligand was docked as expected.

Have you tried to visualize your output pose and compare to an aligned crystal structure?

When you run docking you could also specify a reference ligand input, according to README it's by option --xraylfile

manasi-t24 commented 4 months ago

Hi @rwxayheee , thank you very much for your response. I am working with @srilekha1993 and I would like some basic clarifications since we are new to using Autodock:

  1. As we used the rand-0.pdbqt file in the downloaded dataset without making any changes, we weren't aware that the input ligand file has to be aligned with the receptor coordinates if the input coordinate of the ligand is being used as a reference. Is there a general way in which we can make that change to the rand-0.pdbqt file (not just for this dataset but for all the 140 complexes )?
  2. Yes, you are right in that the input coordinate of the ligand was used to compute the Reference RMSD since no alternate reference was provided.
  3. We have not tried to visualize the output and have not compared it to an aligned crystal structure. Would you be able to suggest a visualization tool that is generally used so that we can take a look at it?
  4. Thanks for pointing out the way to provide the reference ligand input using the --xraylfile option. Can you also tell us where we might find the x-ray reference ligand for the 5wlo dataset (and perhaps all the other receptors in the 140 complexes dataset)?

Thank you very much for your help on this.

Regards, Manasi

rwxayheee commented 4 months ago

Hi @manasi-t24,

The devs in this repository or authors of this work might be able to give you better answers. I will try my best from a user perspective:

As we used the rand-0.pdbqt file in the downloaded dataset without making any changes, we weren't aware that the input ligand file has to be aligned with the receptor coordinates if the input coordinate of the ligand is being used as a reference. Is there a general way in which we can make that change to the rand-0.pdbqt file (not just for this dataset but for all the 140 complexes )?

We don't need the input to be near the designated binding site (could be in arbitrary position). Also rand-0.pdbqt seems like a random conformer of the ligand. It's ok to use it as an input file, but to get a meaningful reference RMSD we will need to provide a correct reference that aligns with the receptor coordinates

We have not tried to visualize the output and have not compared it to an aligned crystal structure. Would you be able to suggest a visualization tool that is generally used so that we can take a look at it?

I use PyMOL in the screenshot showed above. It's very easy to learn, programmable with python integration, supports many common chemical structure files and can make nice looking picture :)

Thanks for pointing out the way to provide the reference ligand input using the --xraylfile option. Can you also tell us where we might find the x-ray reference ligand for the 5wlo dataset (and perhaps all the other receptors in the 140 complexes dataset)?

I checked a few structures and I think the ligand files named flex-xray.pdbqt are generated from ligands in their original positions as crystal structures. Sometimes, alternate locations were assigned in the crystal structures and flex-xray.pdbqt corresponds to one of them. You can double-check with the authors.

If you wish to generate reference files on your own, a possible procedure could be:

  1. Download PDB file from a PDB server and extract ligand coordinates

  2. Obtain the Smiles string of the ligand from a chemical component library

  3. Use rdkit.Chem.AllChem.AssignBondOrdersFromTemplate to assign bond orders As long as all heavy atoms are present in the crystal structure, this function allows you to repair the ligand and turn it into a RDKit molecule with valid bond information which can then be used in Meeko for PDBQT file generation

diogomart commented 4 months ago

i confirm that files named rand- are random conformers in random locations meant to be used as input for docking, and flex-xray have the x-ray positions from the PDB. Thank you @rwxayheee for your response.

manasi-t24 commented 3 months ago

Thank you @rwxayheee and @diogomart . We were able to get the desired results.

Regards, Manasi