connorcoley / retrosim

MIT License
82 stars 14 forks source link

Need the input chemical reaction santisfy atomic conservation? #3

Open hcji opened 5 years ago

hcji commented 5 years ago

Hi, GOOD Work! I m trying to extract reaction templates of other chemical reactions via your code. The chemical reactions were atom mapped, but not santisfy atomic conservation. Can it work correctly?

hcji commented 5 years ago

I ask because I get something wrong:

reaction_smiles = '[O:1]=[S:2]([O-:3])[O-:4]>>[O:1]=[S:2]([OH:4])(=[O:5])[OH:3]'
template = extract_one_template(reaction_smiles)        ## function from your code
c1 = reaction_smiles.split('>>')[0]       ## reactant
c2 = reaction_smiles.split('>>')[1]       ## product
rxn = AllChem.ReactionFromSmarts(template)
prod = rxn.RunReactants([Chem.MolFromSmiles(c2)])   ## calculated reactant
AllChem.CalcExactMolWt(Chem.MolFromSmiles(c1))     ## mass of true reactant, get 79.9579
AllChem.CalcExactMolWt(prod[0][0])                    ## mass of calculated reactant,  get 95.95

How can I fix this?

connorcoley commented 5 years ago

Reactions don't necessarily need to satisfy atom conservation in that leaving groups can be absent from the reaction products, but they aren't meant to create mass. If you use the following reaction SMILES, where all atom mapped product atoms that do not appear in the reactants are included as separate fragments, you can get the behavior you want:

reaction_smiles = '[O:1]=[S:2]([O-:3])[O-:4].[*:5]>>[O:1]=[S:2]([OH:4])(=[O:5])[OH:3]'

You can write a short script to add fragments to your reactants SMILES. It might be worth keeping track of these fragments so you can remove them later. Some code that might be useful:

def fix_reaction_smiles(smiles):
    rcts = Chem.MolFromSmiles(smiles.split('>')[0])
    prds = Chem.MolFromSmiles(smiles.split('>')[2])
    rct_maps = set(a.GetAtomMapNum() for a in rcts.GetAtoms() if a.GetAtomMapNum())
    frags = []; symbs = []
    for a in prds.GetAtoms():
        if a.GetAtomMapNum() and a.GetAtomMapNum() not in rct_maps:
            frags.append('[*:{}]'.format(a.GetAtomMapNum()))
            symbs.append(a.GetSymbol())
    if not frags:
        return smiles, []
    return '{}.{}'.format('.'.join(frags), smiles), symbs

reaction_smiles = '[O:1]=[S:2]([O-:3])[O-:4]>>[O:1]=[S:2]([OH:4])(=[O:5])[OH:3]'
reaction_smiles_fixed, symbs = fix_reaction_smiles(reaction_smiles)

template = extract_one_template(reaction_smiles_fixed)        ## function from your code
c1 = reaction_smiles.split('>>')[0]       ## reactant
c2 = reaction_smiles.split('>>')[1]       ## product
rxn = AllChem.ReactionFromSmarts(template)
outcomes = rxn.RunReactants([Chem.MolFromSmiles(c2)])   ## calculated reactant
for outcome in outcomes:
    reactants = [Chem.MolToSmiles(mol) for mol in outcome]
    for symb in symbs:
        reactants.remove(symb)
    print(reactants)
    print(AllChem.CalcExactMolWt(Chem.MolFromSmiles('.'.join(reactants))))

AllChem.CalcExactMolWt(Chem.MolFromSmiles(c1))     ## mass of true reactant, get 79.9579
chengyunzhang commented 5 years ago

hi: good work! I want to reappear your work. Unfortunately, in get_date.py,the last line code called write_to_files(data) have a IndentationError: unexpected indent in pycharm. I wonder that whether your codes must run in Jupyter Notebook. Furthermore, in your paper called computer-assisted retrosynthesis based on molecular similarity, i learn the top-n accuracy and want to analysis your prediction(smiles) .could you offer me those files in you experiment?

connorcoley commented 5 years ago

Hi @chengyunzhang -- no, these don't have to run inside a Jupyter notebook. I don't know why you're getting an indentation error there.

All of the code/data for that paper is inside this repository. The code in the test script is designed to save only the ranks; if you look at this line, you can look at the actual SMILES recommended for each by examining sorted(probs.iteritems(), key=lambda x:x[1], reverse=True)

chengyunzhang commented 5 years ago

Thank you for your reply,I successfully get SMILES that i want by following your guides. best wishes!

chengyunzhang commented 4 years ago

hi:From your data,i only get the maping-smiles(rxn_smiles),can your offer the original smiles without maping? Thanks

connorcoley commented 4 years ago

You can remove the atom mapping using your cheminformatics toolkit of choice (e.g., rdkit)

On Tue, Feb 18, 2020 at 00:30 chengyun notifications@github.com wrote:

hi:From your data,i only get the maping-smiles(rxn_smiles),can your offer the original smiles without maping? Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/connorcoley/retrosim/issues/3?email_source=notifications&email_token=ABAEXJS6B6VAHLAZPSOO2UDRDNW7TA5CNFSM4G2GLVX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMAULWQ#issuecomment-587285978, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAEXJUNTT644EPIVIIKIU3RDNW7TANCNFSM4G2GLVXQ .

chengyunzhang commented 4 years ago

Thanks to your guides.