duartegroup / autodE

automated reaction profile generation
https://duartegroup.github.io/autodE/
MIT License
161 stars 49 forks source link

Performing NEB given exact atom-mapping #295

Closed IannLiu closed 10 months ago

IannLiu commented 10 months ago

Describe the feature

Exact atom-mapping can be provided by users or other atom-mapping tools. That is, the TS searching can start at known bond arrangement. Is it possible to add bond arrangement parameters to method calculate_reaction_profile or locate_transition_state ?

t-young31 commented 10 months ago

Currently it's not possible I'm afraid. If you want to (try!) and find a TS from a defined bond rearrangement I'd suggest using get_ts with the appropriate arguments. Happy to provide an example if that's not clear

IannLiu commented 10 months ago

Currently it's not possible I'm afraid. If you want to (try!) and find a TS from a defined bond rearrangement I'd suggest using get_ts with the appropriate arguments. Happy to provide an example if that's not clear

Hi, Tom. I've tried to do that. The key step is mapping user-defined SMILES string and autode-defined SMILES string. I did this using following code:

from rdkit import Chem

RDKIT_SMILES_PARSER_PARAMS = Chem.SmilesParserParams()

def str_to_mol(string: str, explicit_hydrogens: bool = True) -> Chem.Mol:
    if string.startswith('InChI'):
        mol = Chem.MolFromInchi(string, removeHs=not explicit_hydrogens)
    else:
        RDKIT_SMILES_PARSER_PARAMS.removeHs = not explicit_hydrogens
        mol = Chem.MolFromSmiles(string, RDKIT_SMILES_PARSER_PARAMS)

    if explicit_hydrogens:
        return Chem.AddHs(mol)
    else:
        return Chem.RemoveHs(mol)

def get_changed_bonds(mapped_rsmis: List[str], mapped_psmis: List[str]):
    mapped_rmols, mapped_pmols = str_to_mol('.'.join(mapped_rsmis)), str_to_mol('.'.join(mapped_psmis))
    r_connect_dict = {atom.GetAtomMapNum(): [a.GetAtomMapNum() for a in atom.GetNeighbors()] \
                      for atom in mapped_rmols.GetAtoms()}
    p_connect_dict = {atom.GetAtomMapNum(): [a.GetAtomMapNum() for a in atom.GetNeighbors()] \
                      for atom in mapped_pmols.GetAtoms()}
    assert sorted(list(r_connect_dict.keys())) == \
           sorted(list(p_connect_dict.keys())), "Unknown map number in reactants or products"
    atom_nums = sorted(list(r_connect_dict.keys()))
    broken = []
    formation = []
    for atom_num in atom_nums:
        for a in r_connect_dict[atom_num]:
            if a not in p_connect_dict[atom_num]:
                bbond = tuple(sorted([atom_num, a]))
                if bbond not in broken:
                    broken.append(bbond)
        for b in p_connect_dict[atom_num]:
            if b not in r_connect_dict[atom_num]:
                fbond = tuple(sorted([atom_num, b]))
                if fbond not in formation:
                    formation.append(fbond)
    return broken, formation

def match_atom_nums(smi1: str, smi2: str):
    mol1, mol2 = str_to_mol(smi1), str_to_mol(smi2)
    matched_atom_idx = mol1.GetSubstructMatch(mol2)
    print(matched_atom_idx)
    matched_atoms = [mol2.GetAtomWithIdx(idx) for idx in range(len(matched_atom_idx))]
    source_atoms = [mol1.GetAtomWithIdx(idx) for idx in matched_atom_idx]
    old_new_idx = {s.GetAtomMapNum(): m.GetAtomMapNum() for s, m in zip(source_atoms, matched_atoms)}
    return old_new_idx
IannLiu commented 10 months ago

Currently it's not possible I'm afraid. If you want to (try!) and find a TS from a defined bond rearrangement I'd suggest using get_ts with the appropriate arguments. Happy to provide an example if that's not clear

With exact atom mapping, get_mapping in get_ts is no longer needed, and thus this step is skipped. Following steps were done for TS searching:

  1. Optimizing reactant and product molecules respectively.
  2. Generating initial geometry using translate_rotate_reactant
  3. Finding TS using get_ts However, an error occurred . Here is my code:
    
    import autode as ade
    from autode.bond_rearrangement import BondRearrangement, add_bond_rearrangment
    from autode.transition_states.locate_tss import get_ts

example for ts search

def drop_map_num(smi: str): """ Drop the atom map number to get the canonical smiles Args: smi: the molecule smiles

Returns:

"""
mol = Chem.MolFromSmiles(smi)
for atom in mol.GetAtoms():
    atom.SetAtomMapNum(0)
return Chem.MolToSmiles(mol)

mapped_rxn = "C:11[O:15][H:22])([H:16])([H:17])[H:18].C:1([H:6])[H:7]>>C:11=[O:15])([H:16])([H:17])[H:18].C:1([H:6])([H:7])[H:22]" rsmis_list, psmis_list = mapped_rxn.split('>>')[0].split('.'), mapped_rxn.split('>>')[-1].split('.')

Reorder atom-mapping of user-defined smiles

Atom-mapping should start at 0, and reindexing in order

atom_nums = [atom.GetAtomMapNum() for rsmi in rsmis_list for atom in str_to_mol(rsmi).GetAtoms()] reordered_num_mapping = {num: idx for idx, num in enumerate(atom_nums)} rmols, pmols = [str_to_mol(smi) for smi in rsmis_list], [str_to_mol(smi) for smi in psmis_list] for rmol, pmol in zip(rmols, pmols): [atom.SetAtomMapNum(reordered_num_mapping[atom.GetAtomMapNum()]) for atom in rmol.GetAtoms()] [atom.SetAtomMapNum(reordered_num_mapping[atom.GetAtomMapNum()]) for atom in pmol.GetAtoms()] reordered_rsmis_list = [Chem.MolToSmiles(rmol) for rmol in rmols] reordered_psmis_list = [Chem.MolToSmiles(pmol) for pmol in pmols]

Get broken and formation bonds

broken, formation = get_changed_bonds(reordered_rsmis_list, reordered_psmis_list) print(f'broken bonds: {broken}, formation bonds: {formation} ')

Initialize reactant and product complex

unmapped_r_list = [drop_map_num(smi) for smi in reordered_rsmis_list] unmapped_p_list = [drop_map_num(smi) for smi in reordered_psmis_list] r_input_list = [ade.Molecule(name=f'reactant{idx}', smiles=smi) for idx, smi in enumerate(unmapped_r_list)] for mol in r_input_list: mol.find_lowest_energy_conformer() rcomplex = ade.species.ReactantComplex(*r_input_list, name='reactants')

p_input_list = [ade.Molecule(name=f'product{idx}', smiles=smi) for idx, smi in enumerate(unmapped_p_list)] for mol in p_input_list: mol.find_lowest_energy_conformer() pcomplex = ade.species.ProductComplex(*p_input_list, name='products')

Setting autode atom-map-num (starting from 0) of complex

relable_r = [] r_atom_numbers = [] for ith, smi in enumerate(unmapped_r_list): mol = Chem.AddHs(Chem.MolFromSmiles(smi)) r_atom_numbers.append(mol.GetNumAtoms()) start_num = r_atom_numbers[ith - 1] if ith > 0 else 0 [atom.SetAtomMapNum(atom.GetIdx() + start_num) for atom in mol.GetAtoms()] print(f"{ith} autode reactant: ") display(mol) print(f'{ith} initial reactant: ') display(str_to_mol(reordered_rsmis_list[ith])) relable_r.append(Chem.MolToSmiles(mol))

rold_new_mapping = {} for old_smi, new_smi in zip(relable_r, reordered_rsmis_list): rold_new_mapping.update(match_atom_nums(old_smi, new_smi)) print("old new mapping: ", rold_new_mapping)

relable_p = [] p_atom_numbers = [] for ith, smi in enumerate(unmapped_p_list): mol = Chem.AddHs(Chem.MolFromSmiles(smi)) p_atom_numbers.append(mol.GetNumAtoms()) start_num = p_atom_numbers[ith - 1] if ith > 0 else 0 [atom.SetAtomMapNum(atom.GetIdx() + start_num) for atom in mol.GetAtoms()] print(f"{ith} autode product: ") display(mol) print(f'{ith} initial product: ') display(str_to_mol(reordered_psmis_list[ith])) relable_p.append(Chem.MolToSmiles(mol)) pold_new_mapping = {} for old_smi, new_smi in zip(relable_p, reordered_psmis_list): pold_new_mapping.update(match_atom_nums(old_smi, new_smi)) print("old new mapping: ", pold_new_mapping)

Reorder atoms of autode complex

rcomplex.reorder_atoms(rold_new_mapping) pcomplex.reorder_atoms(pold_new_mapping) bond_rearr = ade.bond_rearrangement.BondRearrangement(forming_bonds=formation, breaking_bonds=broken)

Initialize reactant and product geometry

from autode.substitution import get_substc_and_add_dummy_atoms from autode.substitution import get_cost_rotate_translate from scipy.optimize import minimize import numpy as np def translate_rotate_reactant( reactant, bond_rearrangement, shift_factor, n_iters=10 ): """ Shift a molecule in the reactant complex so that the attacking atoms (a_atoms) are pointing towards the attacked atoms (l_atoms). Applied in place

---------------------------------------------------------------------------
Arguments:
    reactant (autode.complex.Complex):

    bond_rearrangement (autode.bond_rearrangement.BondRearrangement):

    shift_factor (float):

    n_iters (int): Number of iterations of translation/rotation to perform
                   to (hopefully) find the global minima
"""
if reactant.n_molecules < 2:
    return

# This function can add dummy atoms for e.g. SN2' reactions where there
# is not a A -- C -- Xattern for the substitution centre
subst_centres = get_substc_and_add_dummy_atoms(
    reactant, bond_rearrangement, shift_factor=shift_factor
)

if all(
    sc.a_atom in reactant.atom_indexes(mol_index=0) for sc in subst_centres
):
    attacking_mol = 0
else:
    attacking_mol = 1

# Find the global minimum for inplace rotation, translation and rotation
min_cost, opt_x = None, None

for _ in range(n_iters):
    res = minimize(
        get_cost_rotate_translate,
        x0=np.random.random(11),
        method="BFGS",
        tol=0.1,
        args=(reactant, subst_centres, attacking_mol),
    )

    if min_cost is None or res.fun < min_cost:
        min_cost = res.fun
        opt_x = res.x

# Re-enable the logger

# Translate/rotation the attacking molecule optimally
reactant.rotate_mol(
    axis=opt_x[:3], theta=opt_x[3], mol_index=attacking_mol
)
reactant.translate_mol(vec=opt_x[4:7], mol_index=attacking_mol)
reactant.rotate_mol(
    axis=opt_x[7:10], theta=opt_x[10], mol_index=attacking_mol
)

reactant.atoms.remove_dummy()
reactant.print_xyz_file()

return reactant

rcomplex = translate_rotate_reactant(reactant=rcomplex, bond_rearrangement=bond_rearr, shift_factor=1.5) pcomplex = translate_rotate_reactant(reactant=pcomplex, bond_rearrangement=bond_rearr, shift_factor=1.5)

TS searching

ts = get_ts(name="test", reactant=rcomplex, product=pcomplex, bond_rearr=bond_rearr, is_mapped=True)

IannLiu commented 10 months ago

An error occurred:


    208     """
    209     Calculate the RMSD between two sets of coordinates using the Kabsch
    210     algorithm
   (...)
    219         (float): Root mean squared distance
    220     """
--> 221     assert coords1.shape == coords2.shape
    223     p_mat = np.array(coords2, copy=True)
    224     p_mat -= np.average(p_mat, axis=0)

AssertionError: ```
IannLiu commented 10 months ago

Atom-mapped reaction profile can be generated using 1D or 2D PES scan, and thus I will close this issue.