Closed eunos-1128 closed 5 months ago
Hi Vina expects multiple PDBQT files where each one has only one model.
Thank you for answering.
Using Python bindings, what is the difference between normal docking and batch mode?
The only difference seems to be the ability to dock without initialising a new Vina instance each time.
Could you please tell me if there are any advantages that are only available in batch mode? (I couldn't tell)
That is the only advantage, which makes it faster.
When using set_ligand_from_file
, the program would sometimes crash due to memory overflow because there were too many files in the virtual screening.
When using Python bindings, if there are too many files, I will try to perform normal docking instead of using batch mode.
That sounds like a bug. What was the approximate number of ligands that made it crash? More like thousands, or more like millions?
Hi @eunos-1128 Could you provide a bit more details of your hardware, OS and where do you run from (Jupyter notebook has a memory limit, while a python script shouldn't)?
@diogomart
That sounds like a bug. What was the approximate number of ligands that made it crash? More like thousands, or more like millions?
About a million ligands per a docking.
@rwxayheee
I execute the program using Jupyter notebook on a linux workstation, which I didn't know that would be the limitation.
Hi @eunos-1128 Jupyter notebook has a default memory limit (you could use a search engine to read more on it). It might be helpful to try running the same codes outside of Jupyter notebook. You can copy all codes to a python script, then use your environment python to run it. Otherwise, try increasing Jupyter notebook's memory limit might help, too, but I'm not entirely sure how much it needs to dock a million ligands.
I have never done a very thorough memory monitoring on each steps, so I'm not very sure how memory usage is affected by the way we run it. For a very long time, I was not aware of the batch mode and just run one Vina command for each ligand... Ideally, memory usage shouldn't scale with number of ligands in the batch mode (?) I can take a closer look when I get a chance
@rwxayheee
Thanks for the advice.
I checked and it seems that Jupyter notebook can relax the memory limit, which I'm trying now.
If that doesn't work, I will move the code to a python file and run it.
@diogomart @rwxayheee
v.set_ligand_from_file('1iep_ligand.pdbqt') v.compute_vina_maps(center=[15.190, 53.903, 16.917], box_size=[20, 20, 20])
The next lines are used to first load a PDBQT file containing the ligand called 1iep_ligand.pdbqt and then compute the affinity maps for each ligand atom types accroding to the Vina forcefield. You might need to read first the tutorial Basic docking to learn how to create a PDBQT file of a ligand. There is a small subility here, the behavior of the compute_vina_maps() function changes if the ligand was loaded before or after computing the vina maps. If no ligand was initialized, compute_vina_maps() will compute the affinity map for each atom types defined in the Vina forcefield (22 in total). This is very useful when we want to dock ligands in batch (a.k.a virtual screening) but we don’t necessarily know beforehand what atom types will be necessary for thoses ligands. Alternately to set_ligand_from_file(), you could also load a molecule using a molecule string in PDBQT format using the set_ligand_from_string() function.
I would like to confirm something about this part of the documentation.
Does this mean that I should run set_ligand_from_file
after running compute_vina_maps
if I want to dock multiple ligands at once, for example for virtual screening?
Does the order in which these functions are executed change the docking results (i.e. docking score and ligand RMSD) significantly?
Hi @eunos-1128
Running compute_vina_maps
before setting the first ligand can save time. The documentation recommends this, such that the maps are computed only once, but they are good for all ligands. No more redundant steps to compute the corresponding maps for ligands of different compositions.
Doing set_ligand_from_file
before or after set_ligand_from_file
shouldn't change the results of the docking. However, if it does, please do let us know.
@rwxayheee
Thank you!
I'll try both of two ways to compare results if I have time enough to do.
I used the write_poses
method to write out the docking results after virtual screening using Python bindings, but there was no output.
When I set one ligand PDBQT at a time, not in batch mode, the output files were written out without any problem, but when I set multiple PDBQT files in batch mode, nothing was output.
I can't think of a cause for this, but do you have any ideas?
Hi @eunos-1128 Could you show an example of your codes for batch docking? Doesn't need specific ligand or receptor file, but I wish to know the order of commands (initialize vina, set receptor, compute maps, set ligand, write poses) and how they are arranged in loop. It would be tremendously helpful if you could provide that information. I will try to reproduce with my files
one more thing, in your first post
According to docs, set_ligand_from_file looks to take multiple ligand PDBQT files as arguments.
This might not exactly be the batch mode you're looking for virtual screening. Doing something like the following:
v.set_ligand_from_file(['Ligand_1.pdbqt', 'Ligand_2.pdbqt'])
Will generate poses each containing two ligands (could be another cause for the unexpected memory surge). The multiple ligand PDBQT files are taken as arguments for docking multiple ligands simultaneously. If you want to perform docking one at a time, you need to read them in one by one, use write_poses
after each dock
within the loop.
@rwxayheee
Hi, I use the function below for virtual screening.
def dock_ligands_to_protein(prot_name):
ligand_files = glob.glob(f'{PARENT_DIR}/actives/{prot_name}/active_[1-2].pdbqt')
ligand_files += glob.glob(f'{PARENT_DIR}/decoys/{prot_name}/decoy_[1-2].pdbqt')
result_file = f'{PARENT_DIR}/results/vs/all/ligands_to_{prot_name}.pdbqt'
if os.path.exists(result_file):
return
try:
print(f"Starting docking {prot_name} and compounds(active/decoy)...")
vina = Vina(sf_name="vina", seed=42, verbosity=1)
vina.set_receptor(
f"{PARENT_DIR}/receptors/{prot_name}_elem_charge_fixed-FHs.pdbqt"
)
pml.load(
os.path.expanduser(f"~/eunos/dockings/datasets/DUD-E/all/{prot_name}/crystal_ligand.mol2")
)
centroid_coords = pml.centerofmass('crystal_ligand')
pml.delete('all')
vina.compute_vina_maps(center=centroid_coords, box_size=BOX_SIZE)
vina.set_ligand_from_file(ligand_files)
pp.pprint(vina.info())
vina.randomize()
vina.dock(exhaustiveness=32, n_poses=20)
print(f"Finishing docking {prot_name} and compounds(active/decoy)!")
vina.write_poses(
result_file,
energy_range=10000.0,
overwrite=True,
n_poses=20
)
except Exception as e:
print(f"Exception occurred for {prot_name}: {e}")
finally:
pml.delete("all")
gc.collect()
Hi @eunos-1128 Thanks for posting the commands. I think you might be performing multiple-ligand docking (trying to fit many ligands into one structure), not individual screening. If you want to perform screening, for individual ligands, you could do:
def dock_ligands_to_protein(prot_name):
ligand_files = glob.glob(f'{PARENT_DIR}/actives/{prot_name}/active_[1-2].pdbqt')
ligand_files += glob.glob(f'{PARENT_DIR}/decoys/{prot_name}/decoy_[1-2].pdbqt')
if os.path.exists(result_file):
return
try:
print(f"Starting docking {prot_name} and compounds(active/decoy)...")
vina = Vina(sf_name="vina", seed=42, verbosity=1)
vina.set_receptor(
f"{PARENT_DIR}/receptors/{prot_name}_elem_charge_fixed-FHs.pdbqt"
)
pml.load(
os.path.expanduser(f"~/eunos/dockings/datasets/DUD-E/all/{prot_name}/crystal_ligand.mol2")
)
centroid_coords = pml.centerofmass('crystal_ligand')
pml.delete('all')
vina.compute_vina_maps(center=centroid_coords, box_size=BOX_SIZE)
for ligand_file in ligand_files:
ligand_name = os.path.basename(ligand_file).replace('.pdbqt','')
vina.set_ligand_from_file(ligand_file)
vina.randomize()
vina.dock(exhaustiveness=32, n_poses=20)
print(f"Finishing docking {prot_name} and compound: {ligand_file}!")
result_file = f'{PARENT_DIR}/results/vs/all/{ligand_name}_to_{prot_name}.pdbqt'
vina.write_poses(
result_file,
energy_range=10000.0,
overwrite=True,
n_poses=20
)
except Exception as e:
print(f"Exception occurred for {prot_name}: {e}")
finally:
pml.delete("all")
gc.collect()
i didn't test the codes. But i hope this could explain what I meant by individual virtual screening. Also, the input PDBQT files need to have exactly one ligand each. I don't think multiple-ligand PDBQT inputs are currently supported
@rwxayheee
Thank you for reviewing the code.
I checked set_ligand_from_file
and found it seems to deal with batch ligands.
https://github.com/ccsb-scripps/AutoDock-Vina/blob/v1.2.5/build/python/vina/vina.py#L164-L186
You meant those ligands(pdbqt_filename
whose type is list
or tuple
of PDBQT file paths) are intended to execute multiple-ligands docking?
Also, the input PDBQT files need to have exactly one ligand each. I don't think multiple-ligand PDBQT inputs are currently supported
I understand this point.
Hi @eunos-1128
You meant those ligands are intended to execute multiple-ligands docking?
Yes, in my understanding it's for docking multiple ligands simultaneously. You can do a quick check by giving two ligands, and check out the output - each "pose" will contain two ligands.
I think initially you were docking multiple ligands simultaneously (trying to generate a co-complex of receptor and many, many ligands) which led to the unexpected memory surge
@rwxayheee
I think initially you were docking multiple ligands simultaneously (trying to generate a co-complex of receptor and many, many ligands) which led to the unexpected memory surge
I see...
I'm trying the way you proposed. Thank you.
@rwxayheee
It looks working for the time being.
Thanks a lot!
Hi,
I'm trying to virtual screening with vina's batch mode using Python API.
According to docs,
set_ligand_from_file
looks to take multiple ligand PDBQT files as arguments.I think PDBQT file can have multiple models in it.
Which is appropriate for vina's batch mode, one PDBQT file with multiple models or multiple PDBQT files where each one has only one model? (I've never tried and compared both of two ways. )