Open adalke opened 5 years ago
Currently, the closure value of 6 indicates a six-membered ring. I think that a stronger argument would need to be made to give up this nice feature.
I see your point. You describe the project as "A variant of SMILES for use in machine-learning". I don't think machine-learning systems do better if '6' indicates a six-membered ring, instead of '4'.
However, I believe your point is that "SMILES", at least as Weininger envisions it, is fundamentally meant for humans, so should include some of that human-centered worldview to be honestly called a SMILES variant.
I can respect that decision.
I think there's also room for a variant which is less SMILES-ish in syntax but easier to use by naive systems (algorithms which don't specifically know the (Deep)SMILES grammar).
The closure values "0" and "1" will never be seen in the current DeepSMILES. C0 is meaningless, and C1 has a loop to itself.
Proposal 1: Shift the closure numbers so that "CC0" corresponds to what is currently "CC2".
The closure value "2" can only be seen with dot disconnections, as for example C.C2. Otherwise, a 2 always links to the previous atom, as CC2 or CN)C2. If #6 is implemented, such that closures cannot cross a dot disconnection, then the closure value "2" will never exist in a valid DeepSMILES.
Proposal 2: Shift the closure numbers so that "CCC0" corresponds to what is currently "CCC3".
This would make the closure values 0, 1, and 2 be useful.