alex / rply

An attempt to port David Beazley's PLY to RPython, and give it a cooler API.
BSD 3-Clause "New" or "Revised" License
381 stars 60 forks source link

StackedLexer #18

Open DasIch opened 10 years ago

DasIch commented 10 years ago

I've found that the basic LexerGenerator provided by rply while convenient is too simple for some uses cases. So I've implemented this somewhat more powerful StackedLexer, inspired by pygments.

The basic idea is that instead of having rules directly associated with a lexer, the lexer instead has a set of states and each state has a set of rules. Further rules are extended with information about transitions.

During lexing the lexer maintains a stack of states and uses the rules of the state at the top of the stack. Once a rule matches, the lexer performs a transition, if so defined by that rule.

Possible transititions are pushing a new state to the stack or popping the current state of the stack. While I haven't implemented that, a switch to another state without modifying the stack would also be trivial to implement.

This can be very useful for lexing nested comments (see the test for an example) or strings as a sequence of character and escape sequence tokens.

I've made this pull request not as a request to pull the code as-is but rather to get feedback on the idea, API and implementation. Is there interest in this idea? Is the API as I've designed it acceptable or should it be changed? The StackedLexer represents a superset of Lexer and could also replace it entirely without breaking backwards compatibility, what are your thoughts on that?

At the moment quite a bit of code is duplicated between the two lexers, obviously this is not acceptable and needs improvement but I wanted to get feedback before putting too much time into this.

DasIch commented 10 years ago

I've added a test to ztranslation to ensure StackedLexer is RPython. I used a property which apparently isn't RPython and I fixed that, however now I'm getting a problem due to the .match method of compiled regular expressions not being RPython. How would I go about fixing this?

DasIch commented 10 years ago

I've fixed the issue by simply merging Lexer and StackedLexer into one Lexer class. Lexer now has all the functionality of StackedLexer but is completely backwards compatible.

Another advantage of this is, that it resolves the code duplication issue which means that this could be merged as-is. The only thing that's missing is documentation but I would have to merge in master to do that, which on some projects is discouraged so I don't want to do that without feedback first.

alex commented 10 years ago

I don't have a problem with merging master in.

On Mon, Jan 27, 2014 at 10:10 AM, Daniel Neuhäuser <notifications@github.com

wrote:

I've fixed the issue by simply merging Lexer and StackedLexer into one Lexer class. Lexer now has all the functionality of StackedLexer but is completely backwards compatible.

Another advantage of this is, that it resolves the code duplication issue which means that this could be merged as-is. The only thing that's missing is documentation but I would have to merge in master to do that, which on some projects is discouraged so I don't want to do that without feedback first.

— Reply to this email directly or view it on GitHubhttps://github.com/alex/rply/pull/18#issuecomment-33381036 .

"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero GPG Key fingerprint: 125F 5C67 DFE9 4084

DasIch commented 10 years ago

Ok, I've merged master in and added documentation.

kkiningh commented 6 years ago

Would it be possible to pull this into master? I was looking for this exact feature and am happy to see it's already been implemented!

jonhue commented 5 years ago

+1