IUPAC / IUPAC_SMILES_plus

IUPAC SMILES+ Specification
Other
34 stars 9 forks source link

Adjust Formal Grammar to simplify no leading zeros #23

Open vfscalfani opened 2 years ago

vfscalfani commented 2 years ago

In the current IUPAC SMILES+ draft document, leading zeros are not allowed for atom properties including isotope, H count, charge, atom class, and ring bonds.

In an effort to clarify this in the formal grammar, digit notation was added:

digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' digit_nonzero ::= '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'

We then added notation like this below to specify no leading zeros (The isotope specification supports up to 3 digits, nnn):

isotope ::= digit | digit_nonzero digit | digit_nonzero digit digit

In hindsight, there is probably a cleaner way to do this by defining a number, which can not have leading zeros. If you ideas on how best to do this with the formal grammar, please comment!

merkys commented 2 years ago

In an effort to clarify this in the formal grammar, digit notation was added:

digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' digit_nonzero ::= '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'

We then added notation like this below to specify no leading zeros (The isotope specification supports up to 3 digits, nnn):

isotope ::= digit | digit_nonzero digit | digit_nonzero digit digit

This indeed looks nice and clear. I am not sure if there is chemical sense in allowing isotope 0, though.

In hindsight, there is probably a cleaner way to do this by defining a number, which can not have leading zeros. If you ideas on how best to do this with the formal grammar, please comment!

This could be done as:

natural_number ::= digit_nonzero | natural_number digit

However, by using such rule there is no way to limit the number of digits. Thus I prefer your grammar notation instead.

vfscalfani commented 2 years ago

Great, thanks for the feedback. I'm glad the new notation makes sense. Yes, I agree with you that isotope 0 does not make chemical sense, but we do need to define it in the sense of parsing SMILES. This is actually one of the changes we made compared to OpenSMILES. In OpenSMILES an isotope value of 0 is a zero isotope, while in the IUPAC SMILES+ draft, it states:

A 0 isotope specification is equivalent to undefined, and the atom is assumed to have the naturally-occurring isotopic ratios. For example, [0S] is equivalent to [S].

merkys commented 2 years ago

Thanks for pointing out the description of isotope 0 in IUPAC SMILES+ draft to me. Nevertheless I think isotope 0 should not be allowed, as I cannot see benefit of writing [0S] instead of just [S].