MolecularAI / reaction_utils

Utilities for working with datasets of chemical reactions, reaction templates and template extraction.
https://molecularai.github.io/reaction_utils/
Apache License 2.0
64 stars 11 forks source link

Neutralize molecules #12

Closed iuhgnor closed 1 year ago

iuhgnor commented 1 year ago

I have a question about neutralizing molecules when cleaning reaction smiles.

The following reaction smiles has NaBrO3 [Br:16]([O-])(=O)=O.[Na+]. After cleaning, the [Na+] is removed and [Br:16]([O-])(=O)=O is neutralized to HBrO3 O[Br+2:16]([O-])[O-].

In this case, neutralizing molecules changed the reactant. How to solve this kind of problem? Thanks.

By the way, will reaction_utils supports reaction smiles with fragment grouping?

rsmi = '[Cl:1][CH2:2][CH2:3][CH2:4][C:5]([C:7]1[CH:12]=[CH:11][C:10]([CH:13]([CH3:15])[CH3:14])=[CH:9][CH:8]=1)=[O:6].[Br:16]([O-])(=O)=O.[Na+].[Br-].[Na+].S(S([O-])(=O)=O)([O-])(=O)=O.[Na+].[Na+]>C(Cl)Cl.O>[Br:16][C:13]([C:10]1[CH:9]=[CH:8][C:7]([C:5](=[O:6])[CH2:4][CH2:3][CH2:2][Cl:1])=[CH:12][CH:11]=1)([CH3:15])[CH3:14]'
rxn = reaction.ChemicalReaction(rsmi)
rxn.generate_reaction_template()
rxn.clean_rsmi

Output:

'O[Br+2:16]([O-])[O-].[Cl:1][CH2:2][CH2:3][CH2:4][C:5](=[O:6])[c:7]1[cH:8][cH:9][c:10]([CH:13]([CH3:14])[CH3:15])[cH:11][cH:12]1>ClCCl.O.O=S(=O)([O-])S(=O)(=O)[O-].[Br-].[Na+].[Na+].[Na+].[Na+]>[Cl:1][CH2:2][CH2:3][CH2:4][C:5](=[O:6])[c:7]1[cH:8][cH:9][c:10]([C:13]([CH3:14])([CH3:15])[Br:16])[cH:11][cH:12]1'
SGenheden commented 1 year ago

Hello

I believe if you group the ions together with parenthesis, the Na+ will not be removed from the reactant

rsmi = '[Cl:1][CH2:2][CH2:3][CH2:4][C:5]([C:7]1[CH:12]=[CH:11][C:10]([CH:13]([CH3:15])[CH3:14])=[CH:9][CH:8]=1)=[O:6].([Br:16]([O-])(=O)=O.[Na+]).[Br-].[Na+].S(S([O-])(=O)=O)([O-])(=O)=O.[Na+].[Na+]>C(Cl)Cl.O>[Br:16][C:13]([C:10]1[CH:9]=[CH:8][C:7]([C:5](=[O:6])[CH2:4][CH2:3][CH2:2][Cl:1])=[CH:12][CH:11]=1)([CH3:15])[CH3:14]'
rxn = reaction.ChemicalReaction(rsmi)
rxn.generate_reaction_template()
rxn.clean_rsmi

Output:

'([Na+].[O-][Br+2:16]([O-])[O-]).[Cl:1][CH2:2][CH2:3][CH2:4][C:5](=[O:6])[c:7]1[cH:8][cH:9][c:10]([CH:13]([CH3:14])[CH3:15])[cH:11][cH:12]1>ClCCl.O.O=S(=O)([O-])S(=O)(=O)[O-].[Br-].[Na+].[Na+].[Na+]>[Cl:1][CH2:2][CH2:3][CH2:4][C:5](=[O:6])[c:7]1[cH:8][cH:9][c:10]([C:13]([CH3:14])([CH3:15])[Br:16])[cH:11][cH:12]1'
iuhgnor commented 1 year ago

Thanks for your help. It does work. Is there an automatic way to handle a lot of reaction smiles with salt as reactant?

SGenheden commented 1 year ago

I think the best scenario is when the input reaction SMILES use proper component grouping, with parenthesis and everything. If that is not the case, which is often isn't, then I don't have a solid solution. We have experimented with algorithms that identify pair of ions, or more generally components that should be considered to be one reagent, but it is a difficult task and we haven't a 100% foolproof algorithm.

iuhgnor commented 1 year ago

Thanks for your detailed reply.