diku-dk / alpacc

MIT License
5 stars 0 forks source link

Better CFG notation #17

Open athas opened 5 months ago

athas commented 5 months ago

I think the CFG

string = "[a-zA-Z0-9_\-\s\(\):/@.]*";
num = \-?[0-9]+(.[0-9]+)?;
ignore = \s|\n|\t|\r;

J = O | A | string | num;

O = "{" FS0 "}";
FS0 = | F FS1;
FS1 = | "," F FS1;
F = string ":" J;

A = "[" EL0 "]";
EL0 = | J EL1;
EL1 = | "," J EL1;

would look better if we copied ideas from Pareas and wrote it as

string = "[a-zA-Z0-9_\-\s\(\):/@.]*";
num = \-?[0-9]+(.[0-9]+)?;
ignore = \s|\n|\t|\r;

J [object] -> O;
J [array] -> A ;
J [string] -> string; 
J [number] -> num;

O -> "{" FS0 "}";
FS0 -> ;
FS0 -> F FS1;
FS1 ->  ;
FS1 -> "," F FS1;
F -> string ":" J;

A -> "[" EL0 "]";
EL0 -> ;
EL0 -> J EL1;
EL1 -> ;
EL1 -> "," J EL1;

The stuff in square braces are optional names that can be assigned to productions, and which would be recognisable in the resulting CST. (Similarly, named terminals should also show up, but we don't need to change the CFG notation to enable that.)

WilliamDue commented 5 months ago

This would make a lot of sense.

athas commented 5 months ago

Specifically, alpacc should generate a sum type for named terminals, and one for named productions.