diku-dk / alpacc

MIT License
5 stars 0 forks source link

Add a very simple lexer generator. #3

Closed athas closed 1 year ago

athas commented 1 year ago

The lexemes must be completely disjoint, such that by observing a single character we can immediately decide which terminal (if any) it belongs to.

Whitespace is hardcoded.

This obviously needs significantly more work, but it's actually enough to parse Lisp expressions such as

(define (fib n) (if (= n 0) 1 (mul n (fib (dec 1)))))

which with the sexp.cg grammar gives

[1, 3, 0, 3, 1, 3, 0, 3, 0, 2, 3, 1, 3, 0, 3, 1, 3, 0, 2, 3, 1, 3, 0, 3, 0, 3, 1, 3, 0, 3, 1, 3, 0, 2, 2, 2, 2, 2]
WilliamDue commented 1 year ago

Nice! I was looking into parallel lexical analysis [1] today and it did not seem that difficult if I understand it correctly. Would it not be more ideal to use this method?

[1] Hill, J.M.D., Parallel lexical analysis and parsing on the AMT distributed array processor, Parallel Computing 18 (1992) 699-714.

athas commented 1 year ago

Yes, but I wanted to put together a quick demo first.

WilliamDue commented 1 year ago

I can not get the example you send to work because of "=" and it does not seem like I can extend the regex like so "atom = [=a-z0-9]+;" is this intended?

athas commented 1 year ago

Right, the = should be eq or something. Error detection is not really happening.