aspuru-guzik-group / selfies

Robust representation of semantically constrained graphs, in particular for molecules in chemistry
Apache License 2.0
659 stars 127 forks source link

SELFIES decoding for some tokens #118

Closed Olabisi-Aishat-Bello closed 1 month ago

Olabisi-Aishat-Bello commented 2 months ago

Hi!

I had a question regarding the smiles decoding for certain combinations of tokens. Using sf.decoder("[Ring1][Branch2]") or sf.decoder("[Branch2][#Branch3]") leads to simply empty SMILES strings. Does that mean there has to be an atom token present for every SELFIES string to be decoded into a valid molecule?

robpollice commented 2 months ago

Hi! Thank you for your question. The short answer is yes. From a fundamental perspective, molecules consist of atoms, that is the presence of at least a single atom is a necessary condition for a valid molecule. Hence, when there is any molecular string representation, be it SMILES or SELFIES or similar, that does not contain a single atom, this representation does not correspond to a valid molecule. If you considered a hypothetical SMILES string "11", this would also be empty, that is not a valid molecule. Whether this is actually a valid SMILES string is a different question, in my opinion not, but SMILES explorer actually accepts it and gives an output. I hope this answers your question.

Olabisi-Aishat-Bello commented 1 month ago

Thank you! It does!