Multiple replacements in grow_mol

AlanTanKX commented 1 year ago

Hi, thanks for creating this useful repo which allows us to generate new molecules from a framework!

I have been playing around with the grow_mol function, and I want to specify 3 positions in my starting molecule to grow the molecule. I can do so with the parameter replace_ids and specify the 3 positions in a list. However, I realised that all of the generated molecules only had 1 replacement per molecule. What I am looking for is a way to possibly generate new molecules with 2 or even all 3 positions replaced at once.

I have read the documentation for crem, and as far as I'm aware, there is no way to do this currently. Do you have any advice regarding this issue?

Thank you! Alan

DrrDom commented 1 year ago

Hi Alan,

sorry for that, but documentation is not updated automatically for a long time and therefore some functions can be found only in source code. However, all API functions have docstrings (which should be translated to documentation, but do not)

There is a function enumerate_compounds in utils which does almost exactly what you need (introduced in version 0.2.6). Below is an example. It takes a 3-chlorotoluene and grow it (mode='scaffold') at positions 2 and 6 in the ring and the methyl group (atom ids start with 0 in RDKit). The number of iterations should be set to the number of growing points. protect_added_frag=True is important to avoid growing on newly added fragments.

The difference from what you requested - the output will contain compounds with one and two attached fragments as well as with three fragments. Another possible issue - some atoms can be used more than once, e.g. methyl can be expanded with F and on the next iteration with another F to get difluouromethyl, thus one of three chosen points will not be used at all. One more side effect - replacements are made sequentially. This means that context may change upon addition of fragments. Thus, the order in which atoms will be expanded may affect the final output. This cannot be controlled in the current implementation.

You may use this function as template and create an own one to get exactly what you want or modify the existing one. If you will find it useful you may submit it to the repo. In this case I would prefer to extend the existing function to not introduce new interface functions with very similar behavior.

And the last but not least - beware of combinatorial explosion! :)

Thank you for the question. It seems a good example to add to README.

from crem.utils import enumerate_compounds
from rdkit import Chem

mol = Chem.MolFromMolBlock("""
  Mrv1922 05242309182D          

  8  8  0  0  0  0            999 V2000
   -3.2813    1.3161    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.9957    0.9036    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.9957    0.0786    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2813   -0.3339    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.5668    0.0786    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.5668    0.9036    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2813   -1.1589    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.8523    1.3161    0.0000 Cl  0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  2  0  0  0  0
  3  4  1  0  0  0  0
  4  5  2  0  0  0  0
  5  6  1  0  0  0  0
  1  6  2  0  0  0  0
  4  7  1  0  0  0  0
  6  8  1  0  0  0  0
M  END
""")

mols = enumerate_compounds(mol, 'replacements_sa2.db', mode='scaffold', n_iterations=3,
                           radius=3, max_replacements=2, replace_ids=[2,4,6], protect_added_frag=True, 
                           return_smi=True)

print(mols)

output

['COc1c(C)cccc1Cl', 
'Cc1cc(Cl)ccc1Cl', 
'COc1ccc(Cl)c(OC)c1C', 
'COc1c(Cl)cccc1CF', 
'Cc1c(Cl)ccc(Cl)c1C', 
'CSCc1cc(Cl)ccc1Cl', 
'COc1ccc(Cl)c(OC)c1CC#N', 
'COCc1c(OC)ccc(Cl)c1OC', 
'COc1c(Cl)cccc1C(F)F', 
'COc1c(Cl)ccc(CO)c1CF', 
'Cc1c(Cl)ccc(Cl)c1CC#N', 
'Cc1c(Cl)ccc(Cl)c1CCN', 
'CSCc1c(Cl)ccc(Cl)c1C', 
'CSCc1c(Cl)ccc(Cl)c1Cl']

AlanTanKX commented 1 year ago

Thank you so much for the quick response, and for suggesting this useful function!

I do not mind the side effects that you mentioned - I just want a way to be able to randomly generate a combination of singly, doubly and triply grown molecules. In this case, I would just set n_iterations = 3.

DrrDom / crem

Multiple replacements in grow_mol #22