diku-dk / alpacc

MIT License
5 stars 0 forks source link

Add more human-friendly CFG notation. #1

Closed athas closed 1 year ago

athas commented 1 year ago

Probably some tests need to be adjusted, but the main problem is that in this notation, the terminals are implicit. That's generally fine, except that currently you have to know the terminal ordering in order to produce valid parser input.

athas commented 1 year ago

This CFG grammar distinguishes terminals (leading uppercase) and nonterminals (leading lowercase). That suggests a fairly clean way of introducing lexical rules. Just write them as productions:

int -> digit digits
digit -> 0
digit -> 1
...
digit -> 9
digits ->
digits -> digit digits

Productions for terminals would then be picked out and processed by a lexer generator. We would of course verify that they correspond to a regular language (and in particular do not reference any nonterminals). We will end up with a notation quite similar to EBNF.

WilliamDue commented 1 year ago

Sounds like a good idea and it is probably better to use grammar definitions like EBNF.

I think the tests will never pass because the Python test script still generates grammars using the old style.

I can fix this but I am also currently trying to remove the warnings.

athas commented 1 year ago

Don't bother fixing the tests until I've stabilised this notation.

athas commented 1 year ago

I have updated the test generator, but for the tests fail. Probably I generate the nonterminals in the wrong order or something.

athas commented 1 year ago

Yes, it is quite clear that the length of the derivations is correct, but the nonterminal indexes are wrong.

athas commented 1 year ago

I think this is because I group all the productions belonging to a single nonterminal, which might change the production numbering. The LL parser in main.py does not do this, and so uses a different numbering. I'm not sure how to fix that.

athas commented 1 year ago

I think I managed to figure something out. A bit janky, but so are other things in that testing script.

WilliamDue commented 1 year ago

I think I managed to figure something out. A bit janky, but so are other things in that testing script.

Great, the whole test script is actually quite janky so it a bit janky will not matter.