OpenFreeEnergy / gufe

grand unified free energy by OpenFE
https://gufe.readthedocs.io
MIT License
28 stars 8 forks source link

rdkit MolFromMol2File function assigns "wrong" tautomer, tough this seems to get corrected in the SmallMoleculeComponent #302

Open hannahbaumann opened 1 year ago

hannahbaumann commented 1 year ago

I'm looking into the BACE system and noticed that when I load the ligands into rdkit it assigns the less favorable tautomer (protonated N in 6 membered ring primary vs. secondary amine).

ligand_files = ['lig_02.mol2', 'lig_03.mol2']
rdmols = []
for l in ligand_files:
    rdmol = Chem.rdmolfiles.MolFromMol2File(l, removeHs=False)
    with Chem.SDWriter('%s.sdf'%l[:-5]) as w:
        w.write(rdmol)

the input mol2 file and output sdf file here are different tautomers.

When doing a dry run of the solvent leg of an edge between these two ligands, the output hybrid_system.pdb is in the 'correct'/more favorable tautomer, so somewhere along the way this seems to be corrected.

There is an additional cleanupSubstructure flag in the rdkit mol2 function which when set to False seems to lead to loading in the correct tautomer as well, though I'm not sure how dangerous it is to put that to False... Chem.rdmolfiles.MolFromMol2File(l, removeHs=False, cleanupSubstructures=False)

Ligand files are here: https://drive.google.com/drive/folders/12d1-RJ6iWgngZdhWP4KmIqXIw2y9JZVZ?usp=sharing

richardjgowers commented 5 months ago

This looks like resonance issues rather than tautomerism - hydrogens aren't being moved around. Looking at the atom types defined in the file, there's a guanidine-like C.cat there, so we're entering this piece of logic here:

https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/FileParsers/Mol2FileParser.cpp#L352

tl;dr This just seems like mol2 files being awful, we should probably deprecate/warn about their usage as you get headaches like this. I've not had time to follow through to see how this affects how a SMIRNOFF type force field behaves here.