Numbers are very restricted, inconsistent

dead-claudia commented 8 years ago

Numbers are crudely checked against the regex /-?\d+(\.\d+)/ (accounting for two regexes with identical meaning. This is very restrictive compared to most languages. Example of the full ES6 offering:

Binary: 0b1010 === 10
Octal: 0o17 ==== 13
Hex: 0xf0 === 240
Trailing dot: 42. === 42

This probably should be improved (and is on my TODO list if I can get time).

anko commented 8 years ago

Indeed. I didn't add other ways of specifying numbers because I didn't want to deal with the potential bugs yet, and pulling in all of esprima as a dependency just to get exact JS number-parsing felt silly.

dead-claudia commented 8 years ago

@anko A patch I'm working on for sexpr-plus is actually going to trash the PEG.js dependency, by doing its own dedicated parsing. I'm writing it in pure LiveScript, and once I'm ready, it'll be sent. I found that writing the parser with locations and everything turned out to be only about 50% longer than the PEG.js version. Part of the reason I'm writing a new parser is because PEG.js was failing to recognize my new escape sequences correctly, and I had to use a lot of boilerplate and repetition just to get the sequences parsed initially.

anko commented 8 years ago

Have you considered #24 (reader macros)? Something like @lhorie's read-table-based parser might get 2 birds with 1 stone.

dead-claudia commented 8 years ago

Good point. It's already mostly written, though, and I used a very similar method for the parsing. Here's what I have so far, although it's not yet tested. I wouldn't have to modify it much to allow for readtable parsing.

lhorie commented 8 years ago

My two cents: when I added support for this to my toy compiler last week, I actually didn't need to change the parser all that much, because, as you can see from @anko's link, it's extremely lenient with tokens (e.g it'll happily tokenize insane things like "a, 1b7 and }o{).

I have a theory that overkill leniency is a good property to have in a parser when you have a language that is designed to have extensible syntax.

What I did to add support for obscure number syntax was add validation in the estree building step. This way, I can keep the parser untouched and I can eventually write different backends to handle language variants (e.g. ES5 vs ES6 vs ES7), or even different languages.

dead-claudia commented 8 years ago

That's pretty close to what I was planning on doing for Eslisp when I get the time. Parse it where numbers are currently parsed.

On Sun, Oct 25, 2015, 23:55 Leo Horie notifications@github.com wrote:

My two cents: when I added support for this to my toy compiler last week, I actually didn't need to change the parser all that much, because, as you can see from @anko https://github.com/anko's link, it's extremely lenient with tokens (e.g it'll happily tokenize insane things like "a, 1b7 and }o{).

I have a theory that overkill leniency is a good property to have in a parser when you have a language that is designed to have extensible syntax.

What I did to add support for obscure number syntax was add validation in the estree building step. This way, I can keep the parser untouched and I can eventually write different backends to handle language variants (e.g. ES5 vs ES6 vs ES7), or even different languages.

— Reply to this email directly or view it on GitHub https://github.com/anko/eslisp/issues/29#issuecomment-151017895.

anko / eslisp

Numbers are very restricted, inconsistent #29