dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Case insensitive parsing? #157

Closed veelo closed 9 years ago

veelo commented 9 years ago

Hi,

I just found Pegged and I think it is great (and very well documented). I am playing with the idea of trying myself at an Extended Pascal [1] to D translator. Extended Pascal is case insensitive. I have noticed that some other implementations (in other languages) offer ways to parse literals case insensitively (e.g., by differentiating singe/double quotes, or appending an i. Is there any support in Pegged?

If not, I will probably have to write ~([Pp] [Rr] [Oo] [Gg] [Rr] [Aa] [Mm]) instead of "program" and so forth.

I know unifying inconsistently cased identifiers will be a bigger challenge (p2c [2] does that) but first I need to see if all constructs can be translated at all.

Thanks! Bastiaan.

[1] http://en.wikipedia.org/wiki/Pascal_(programming_language)#ISO.2FIEC_10206:1990_Extended_Pascal [2] http://schneider.ncifcrf.gov/p2c/historic/daves.index-2012Jul25-20-44-55.html

PhilippeSigaud commented 9 years ago

Hi,

thanks for trying Pegged!

There is no support for case-insensitive parsing in Pegged. One possibility would be to use semantic actions to call a toLower/toUpper function before parsing. I guess D's Phobos has some case-managing functions, maybe in std.string.

veelo commented 9 years ago

OK thanks.

veelo commented 9 years ago

I don't think this bird is going to fly, as the grammar has a number of indirect left-recursions. Out of interest, did you have to work around left recursion when writing dgrammar.d?

PhilippeSigaud commented 9 years ago

I don't think this bird is going to fly, as the grammar has a number of indirect left-recursions.

Ah, OK.

Out of interest, did you have to work around left recursion when writing dgrammar.d?

Yes. Walter used left-recursion a lot in some constructs (arithmetic and logical expressions, for example).

I have a GLL (Generalized LL) parser somewhere, which deals gracefully with all sorts of recursion (left, right, hidden, direct or indirect), avec even ambiguity. But it's a bit slow. If I had time to work on Pegged again, I'd have it produce two other parsers : LALR and GLL. The idea would be: try LALR and if it works, all good (O(n) parsing). If LALR does not work, swith to GLL.