Closed Feriolet closed 3 months ago
And what was the output in those cases? Did you use protonated smiles as input?
I noticed that both these structures have positively charged nitrogens which formally becomes chiral after protonation.
Just a wild guess. In multiprocessing some internal properties are discarded and this leads to different sets of isomers. This may be fixed by adding to the main call the following line Chem.SetDefaultPickleProperties(Chem.PropertyPickleOptions.AllProps)
.
I did it but in particular functions, not on the highest level of the program. So, currently this call is not visible to all parts of the program.
Actually it could be a good idea to dock both isomers based on such nitrogens. However, it may require to change the order of protonation and stereosiomer generation or make an additional step to continue enumeration of stereoisomers after protonation for such cases. I would rate this as a minor issue, but may worth to solve it.
Yes, the input of the SMILES was protonated. The intended output should have two stereoisomer for the first SMILES and three for the second SMILES.
Anyway, you are right. Calling Chem.SetDefaultPickleProperties(Chem.PropertyPickleOptions.AllProps)
to the init_db()
function does solve the problem. The result now matches with the current version of easydock.
It can be a good idea to generate stereoisomer after protonation, but would that mean that we may have to add more rows to the existing sql table? Just a thought.
It can be a good idea to generate stereoisomer after protonation, but would that mean that we may have to add more rows to the existing sql table? Just a thought.
Yes, it seems this is the easiest way to solve. After protonation (if it was called), check protonated smiles in DB whether they contain undefined chiral centers, generate additional stereoisomers for these compounds and store them to DB as additional rows. However, this should be made optional to not break compatibility with our other tools. This could be a separate function which is called after add_protonation
. In this way it will be maximally explicit and flexible.
Is there a quick way to identify compounds with undefined chiral center? I tried to find existing function that serve this purpose and they seem to search the atom chirality one by one. I have not tested the function, but it would be nice if it does not run too long for large compounds.
I assume that the pipeline would be:
SELECT protonated_smi from table
Chem.ChiralType.CHI_UNSPECIFIED
get_isomer(mol)
stereo_id=5
at the first enumeration, the additional row for second enumeration should have stereo_id = 6,7,...
)I initially worry that the order of the smiles would be affected, but I guess it would not matter much since the order was already there in the existing .db file, and using MIN(docking_score) in save_sdf()
would pretty much make the order the same.
def _find_undefined_stereo_atoms(rdmol, assign_stereo=False):
"""Find the chiral atoms with undefined stereochemsitry in the RDMol.
Parameters
----------
rdmol : rdkit.RDMol
The RDKit molecule.
assign_stereo : bool, optional, default=False
As a side effect, this function calls ``Chem.AssignStereochemistry()``
so by default we work on a molecule copy. Set this to ``True`` to avoid
making a copy and assigning the stereochemistry to the Mol object.
Returns
-------
undefined_atom_indices : list[int]
A list of atom indices that are chiral centers with undefined
stereochemistry.
See Also
--------
rdkit.Chem.FindMolChiralCenters
"""
from rdkit import Chem
if not assign_stereo:
# Avoid modifying the original molecule.
rdmol = copy.deepcopy(rdmol)
# Flag possible chiral centers with the "_ChiralityPossible".
Chem.AssignStereochemistry(rdmol, force=True, flagPossibleStereoCenters=True)
# Find all atoms with undefined stereo.
undefined_atom_indices = []
for atom_idx, atom in enumerate(rdmol.GetAtoms()):
if atom.GetChiralTag() == Chem.ChiralType.CHI_UNSPECIFIED and atom.HasProp(
"_ChiralityPossible"
):
undefined_atom_indices.append(atom_idx)
return undefined_atom_indices
A little bit more details
SELECT id, smi, protonated_smi FROM mols WHERE source_mol_block IS NULL
(if source_mol_block
is not null, than this is a 3D molecule and all stereoconfigurations are defined)?
is in the output
Chem.FindMolChiralCenters(Chem.MolFromSmiles('C1C[C@H](C)C(C)[C@H](C)C1'),includeUnassigned=True)
[(2, 'S'), (4, '?'), (6, 'R')]
True
rerun mol with get_isomer(mol) - with a large number of max_isomers
to enumerate all isomers (I do not expect that there will be many such centers in a molecule, so it will be safe)
I am trying to use multiprocessing to increase the computational speed of running the
init_db()
function, but I found that there is a mismatch of the result when I use multiprocessing.Using multiprocessing seems to produce less isomers overall compared when not using it. Below is the function that I use for multiprocessing. While using subprocess increase the time significantly (30 CPUs: 90s for 250k molecules compared to 5h without using it), I was wondering if there is any explanation for why it produces less isomer. I also noticed that some isomers produced without multiprocessing tends to have the same vina score.
How to reproduce(2 'problematic' SMILES):
run_dock -i smile/train_smiles_final_updated.smi -o docked/multi_train_smiles_final_updated.db --program vina --config config.yml -c 30 --sdf -s 5 --protonation dimorphite