Closed athas closed 4 years ago
There is arguably a small ambiguity with scientific notation (2e3
is numeric), but I don't think it matters. And any consumers that need to treat 2e3
as symbolic can just accept numerical symbols instead. My issue is that tokens like 1i2
are not accepted as symbols at all.
Note that this is technically a change in behaviour, since currently (0i2)
is parsed as a list containing the two symbols 0
and i2
. I would however argue that this is a tokenization bug.
You are right, the lexer treats 0i2
as two separate tokens however it should have either accepted it as a symbol or raised an error (depending on which behaviour we like better).
I also agree that 0i2
should really be treated as a symbol. However the clash between 2e3
-the number and 2e3
-the symbol is quite unpleasant. And that would be a shame if you can enter any symbol starting with a digit you like but not something like [0-9]+e[0-9]+
. Reading it as a number and then somehow recovering the symbolic representation of it does not work: 01e0
reads as 1.0
which is textually quite different.
The only solution I can think of is to ban scientific notation of numbers in the lexer. And if anyone wants to read number in scientific format, it's always possible to parse it from the corresponding symbol. E.g. real
and double
grammars could accept both AtomNumber
and AtomSymbol
of a certain shape. (See implementation of real)
However there is one more problem with that: 1e2
would be perfectly parseable by both symbol
and real
grammars but e.g. 100
won't be parsed as symbol
anymore. This also looks like an inconsistency. Also, it adds burden of dispatching between AtomNumber
and AtomSymbol
+parsing of scientific format to clients who use Language.Sexp
directly, without the invertible grammar stuff. OTOH, this is probably minor comparing to not being able to have symbols starting with a digit at all.
Looks good to me!
In most (all?) Lisps, symbols are allowed to start with a digit. Tokens like
1i2
,1_2
, etc. should be considered symbolic atoms. The current lexer explicitly behaves otherwise, but I'm wondering if there's a good reason for this. I don't think there would be any ambiguity in being more flexible.