Closed WesleyyC closed 5 years ago
The "extra" atom map numbers are just unique numbers to make sure it is defined for every atom, but there is no meaning to the numbering. A code snippet like the following would suffice:
import rdkit.Chem as Chem
from itertools import chain
def complete_mapping(rxn_smi):
r_smi,agent,p_smi = rxn_smi.split('>')
r = Chem.MolFromSmiles(r_smi)
p = Chem.MolFromSmiles(p_smi)
max_map = max(a.GetAtomMapNum() for a in chain(r.GetAtoms(), p.GetAtoms()))
for a in chain(r.GetAtoms(), p.GetAtoms()):
if not a.GetAtomMapNum():
a.SetAtomMapNum(max_map+1)
max_map += 1
return '>'.join((Chem.MolToSmiles(r), agent, Chem.MolToSmiles(p)))
complete_mapping('[CH3:1][OH:2].[CH3:3][CH2:4]Cl>>[CH3:1][O:2][CH2:4][CH3:3]')
# returns [CH3:1][OH:2].[CH3:3][CH2:4][Cl:5]>>[CH3:1][O:2][CH2:4][CH3:3]
Got it, thanks!
@connorcoley : Any way we could know the source of data, w.r.t. the patent info, for the reactions you used? As in, which year and patent number? We have used the USPTO data for one of our projects - trying to map these reactions to the ones we used (not a simple reaction smiles match, as the atom mappings have changed)
Unfortunately that information wasn’t carried through the pipeline. Would it be possible to do that comparison with the atom mapping stripped?
On Tue, Jan 21, 2020 at 03:04 ahseena96 notifications@github.com wrote:
@connorcoley https://github.com/connorcoley : Any way we could know the source of data, w.r.t. the patent info, for the reactions you used? As in, which year and patent number? We have used the USPTO data for one of our projects - trying to map these reactions to the ones we used (not a simple reaction smiles match, as the atom mappings have changed)
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/connorcoley/rexgen_direct/issues/10?email_source=notifications&email_token=ABAEXJSNPTD5DJYMVAKQL5LQ62UCNA5CNFSM4JMOVQ22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJO2WRI#issuecomment-576564037, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAEXJUIN63CPOSDAYQIWGDQ62UCNANCNFSM4JMOVQ2Q .
Maybe - will try that out. Thank you.
The "extra" atom map numbers are just unique numbers to make sure it is defined for every atom, but there is no meaning to the numbering. A code snippet like the following would suffice:
import rdkit.Chem as Chem from itertools import chain def complete_mapping(rxn_smi): r_smi,agent,p_smi = rxn_smi.split('>') r = Chem.MolFromSmiles(r_smi) p = Chem.MolFromSmiles(p_smi) max_map = max(a.GetAtomMapNum() for a in chain(r.GetAtoms(), p.GetAtoms())) for a in chain(r.GetAtoms(), p.GetAtoms()): if not a.GetAtomMapNum(): a.SetAtomMapNum(max_map+1) max_map += 1 return '>'.join((Chem.MolToSmiles(r), agent, Chem.MolToSmiles(p))) complete_mapping('[CH3:1][OH:2].[CH3:3][CH2:4]Cl>>[CH3:1][O:2][CH2:4][CH3:3]') # returns [CH3:1][OH:2].[CH3:3][CH2:4][Cl:5]>>[CH3:1][O:2][CH2:4][CH3:3]
Hi,Use the method you mentioned to complete the atom mapping in order to map all the atoms of the product. However, when I run, it gets raise Exception(smiles) Exception: CH2:1[OH:40])[OH:39])[OH:38])[OH:12].NH:32[CH3:91])C:90CH:66[O:69]CH:70[CH:72]1[NH:73]C:74=[O:76])[CH3:77])=[O:78])=[O:99])C:82=[O:84])=[O:98])[CH2:93][CH2:94][CH2:95][CH2:96][NH2:97])=[O:92]
After debugging, the reason is found in the rexgen_direct/core_wln_global/mol_graph code block, the content in the image box. Can you tell me what I should do about it?Thank you.
Hi, I was looking at the original data from Daniel Lowe here but realized the data you folks processed has a full atom-mapping instead of only the mapping for the atoms in the product. I am wondering if you can share the code for such preprocessing. Thanks!