aspuru-guzik-group / group-selfies

Apache License 2.0
51 stars 9 forks source link

how to get SMILES back from group-selfies? #3

Open jessielyons opened 1 year ago

jessielyons commented 1 year ago

Any help/guidance is appreciated.

Pixelatory commented 11 months ago

Using the tutorial, this worked for me:

import selfies as sf
from group_selfies import (
    Group,
    GroupGrammar
)
from rdkit import Chem

g = Group('trifluoromethane', '*1C(F)(F)F')
g2 = Group('pyrazole', 'N1C=C=N1', all_attachment=True)
g3 = Group('toluene', 'C(*3)C1=CC=CC=C1', all_attachment=True)
g4 = Group('chiral', '*x1[C@](*x1)(*x1)(*1)', priority=10)
g5 = Group('chiral2', '*x1[C@H](*x1)(*x1)', priority=10)

grammar = GroupGrammar([g, g2, g3, g4, g5]) #creating a grammar using a list

m = Chem.MolFromSmiles('CC1=CC=C(C=C1)C2=CC(=NN2C3=CC=C(C=C3)S(=O)(=O)N)C(F)(F)F') #celcoxib
print('canonical', Chem.MolToSmiles(m))
encoded = grammar.full_encoder(m)
print('group selfies', encoded)

decoded = grammar.decoder(encoded)
smi = Chem.MolToSmiles(decoded)
print('decoded to SMILES', smi)
print('into SELFIES', sf.encoder(smi))

My output then becomes:

canonical Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(N)(=O)=O)cc2)cc1
group selfies [:0toluene][Ring2][C][=C][C][=Branch][=N][N][Branch][C][=C][C][=C][Branch][C][=C][Ring1][=Branch][pop][S][=Branch][=O][pop][=Branch][=O][pop][N][pop][Ring1][Branch][pop][:0trifluoromethane][pop]
decoded to SMILES Cc1ccc(-c2cc(C(F)(F)F)nn2-c2ccc(S(N)(=O)=O)cc2)cc1
into SELFIES [C][C][=C][C][=C][Branch2][Ring2][Ring2][C][=C][C][Branch1][=Branch2][C][Branch1][C][F][Branch1][C][F][F][=N][N][Ring1][=Branch2][C][=C][C][=C][Branch1][=Branch2][S][Branch1][C][N][=Branch1][C][=O][=O][C][=C][Ring1][#Branch2][C][=C][Ring2][Ring1][=Branch2]