Closed jpellegrini closed 10 months ago
Some more information:
1_2_a => symbol |12_|
I can't work on this right now, but here's an idea (I'll try to implement later):
read_integer_or_real
makes a copy of the string and works on it; **end
#f
without updating anything.
read_integer_or_real
makes a copy of the string and works on it;
It has no idea where the end of the token is, so not viable.
Another idea is for remove_underscores
to make a copy, allocating a buffer and growing it as necessary. This reduces the overhead to only tokens with underscores.
Generally the best idea for reading Lisp is:
|...|
vertical bar symbols like strings.In Scheme pseudo-code, something like this:
(case first-char
...
((#\")
(read-delimited-string #\"))
((#\|)
(string->symbol (read-delimited-string #\|)))
(else
(let ((token (read-token first-char)))
(if (string=? token "")
(error "Invalid syntax")
(or (parse-number token)
(string->symbol token))))))
NB: RnRS uses a BNF-style grammar, but that's IMHO a really confusing mental model for Lisp. Common Lisp does not use BNF; it presents a reader algorithm tailor made for Lisp.
I think I have a fix for that issue. I'll clean it and test it more later.
Generally the best idea for reading Lisp is:
* Treat `|...|` vertical bar symbols like strings. * For other symbols, as well as numbers, read a "token" of symbol-or-number characters into a string and then parse that string.
In Scheme pseudo-code, something like this:
(case first-char ... ((#\") (read-delimited-string #\")) ((#\|) (string->symbol (read-delimited-string #\|))) (else (let ((token (read-token first-char))) (if (string=? token "") (error "Invalid syntax") (or (parse-number token) (string->symbol token))))))
This is what we already do in STklos. The problem was that parse-number
may modify token
to make it readable by standard C functions (for instance, if the number uses d
exponent notation, we replace it by and e
before to send the string to standard strod
). One solution is of course to make a copy of token in parse-token
but doing it for each token costs a lot.
Current implementation avoids allocation as much as possible and should fix the issue.
Hi @egallesio ! A minor issue, I think:
If a token starts with a number, and it does not parse as a number. then STklos reads it as a symbol:
However, if it has underscores, then -- depending on the character that follows the udnerscore --the part after the underscore is discarded -- it's neither part of the symbol, nor read as a separate token later:
and the
abcdef
is lost...(I don't know why it kept
d
anddef
but not the others)Anyway... I guess this was introduced with the SRFI that allows underscores in numbers (so the mistake is mine, since I have implemented those changes to the reader -- sorry!) I'll take a look later.