Jamson-Zhong / Graph2Edits

MIT License
29 stars 4 forks source link

Questions about generate edits #6

Open QJ-Chen opened 9 months ago

QJ-Chen commented 9 months ago

In utils/generate_edits.py, line 115 a1, a2 is sorted when generate leaving groups and the attaching ligand may be detected only when a1 not in atoms_only_in_react and a2 in atoms_only_in_react. Is there possibility that a2 not in atoms_only_in_react and a1 in atoms_only_in_react? A case:

>>>smi = '[C:2][C:1][C:3][C:4][C:5]>>[C:1][C:3][C:4][C:5]'
>>>generate_reaction_edits(smi)
ReactionData(rxn_smi='[C:1]([C:2])[C:3][C:4][C:5]>>[C:1][C:3][C:4][C:5]', edits=[('Attaching LG', '*[C]'), 'Terminate'], edits_atom=[1], rxn_class=None, rxn_id=None)

>>> smi = '[C:1][C:2][C:3][C:4][C:5]>>[C:2][C:3][C:4][C:5]'
>>> generate_reaction_edits(smi)
ReactionData(rxn_smi='[C:1][C:2][C:3][C:4][C:5]>>[C:2][C:3][C:4][C:5]', edits=['Terminate'], edits_atom=[], rxn_class=None, rxn_id=None)

I think these 2 reactions are same and the edits should not be related to map number.

Another problem is about atoms only products. Error raised in line 90 or line 97. (Detaching ligand?)

>>> smi = '[C:1][C:2][C:3][C:4][C:5]>>[C:6][C:2][C:3][C:4][C:5]'
>>> generate_reaction_edits(smi)
Jamson-Zhong commented 9 months ago

When preprocessing the data to genetate the edits, we first canonicalize the product SMILES by re-arranging the atom order to be the same as the canonical atom order, so the map numbers are not random. Hope this solve your problem.

QJ-Chen commented 9 months ago

Thanks. Canonicalizing solved these cases.

Here is another case from USPTO-50k whose actions seems to be incomplete. Is the result supposed?

ReactionData(rxn_smi='[NH2:11][c:12]1[cH:13][cH:14][c:15]([F:16])[cH:17][c:18]1[F:19].[O:1]=[C:2]1[C:3]2=[C:4]([CH2:5][CH2:6][CH2:7][CH2:8]2)[C:9](=[O:10])[O:20]1>>[O:1]=[C:2]1[C:3]2=[C:4]([CH2:5][CH2:6][CH2:7][CH2:8]2)[C:9](=[O:10])[N:11]1[c:12]1[cH:13][cH:14][c:15]([F:16])[cH:17][c:18]1[F:19]', edits=[('Delete Bond', (None, None)), ('Delete Bond', (None, None)), 'Terminate'], edits_atom=[[9, 11], [2, 11]], rxn_class=3, rxn_id='US04001272')