bioexcel / biobb_chemistry

Biobb_chemistry is the Biobb module collection to perform chemistry over molecular dynamics simulations.
https://mmb.irbbarcelona.org/biobb/
Apache License 2.0
4 stars 1 forks source link

Add timeout and parameter to control conformer generation effort #10

Closed PabloNA97 closed 1 month ago

PabloNA97 commented 1 month ago

I have a pipeline using biobb to do a virtual screening. The pipeline admits SDF files or SMILES files for the ligand library. In the case of SMILES, the workflow will generate a protonated conformer from the SMILES. This generation is done using obabel through babel_convert which executes the following command:

obabel ligand.smi -O ligand.pdbqt --gen3d -p 7.4 -r -xh

This works fine most of the time. However, one can have special cases in which the generation of the conformers takes too much time. For example using the following SMILES in ligand.smi : C[C@@H]1C[C@H]2C3CCC4=CC(=O)C=C[C@]4(C)[C@@]3(F)[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)O, the conversion takes +1h after throwing the following warning (I'm not sure if it is able to converge or not):

*** Open Babel Warning  in CorrectStereoAtoms
  Could not correct 3 stereocenter(s) in this molecule ()
  with Atom Ids as follows: 3 13 15
Warning: Stereochemistry is wrong, using the distance geometry method instead

Thus the workflow gets stuck in a particular smiles instead of moving on and dock the rest of the ligands.

Here I see two things that would be useful. The first one is an additional property with a timeout in the call to obabel from babel_convert. The second one is to give users the ability to choose the computational effort they want to dedicate for the conformer generation with an additional property (see below). In this way the developer of the workflow would be able to re-launch the conformer generation with a cheaper method if the timeout expired.

To control the computational effort of the conformer generation we only need and additional parameter described here (see Gen3D section)

Versions:

gbayarri commented 1 month ago

Hi @PabloNA97

I can add a new effort parameter with the 5 possible values described in the link you provided, but I don't understand the timeout parameter. What do you expect to happen when the timeout expires?

PabloNA97 commented 1 month ago

I would expect the processes spawned by subprocess to get killed. So that the workflow can continue instead of getting stuck in one ligand. Maybe the subprocess library doesn't have a direct support for a timeout though, see here.

Whenever the workflow is repeated for many cases (like here for many ligands) and one of the steps might get stuck (it takes too long to finish) it would be good to have this option. I have found that creating conformers using openbabel for a virtual screening can create such a problem. Thanks @gbayarri

gbayarri commented 1 month ago

Hi @PabloNA97

The new effort parameter is included in the new biobb_chemistry v4.2.1. As for the timeout, it implies doing changes in the biobb_common, so we will try to have it in the next 2024.2 release.