dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
533 stars 66 forks source link

add python example #309

Closed mw66 closed 2 years ago

mw66 commented 2 years ago

Hi,

I tried to add python example, the syntax is taken from:

https://docs.python.org/3/reference/grammar.html

I made minimal modification to use Pegged syntax.

However, the make test is still not quite working, and as a Pegged newbie, I cannot figure out how to fix these compilation errors:

~/project/contrib/Pegged/examples/python$ make
dub test
Generating test runner configuration 'python-test-library' for 'library' (library).
Performing "unittest" build using /ldc2-1.28.0-linux-x86_64/bin/ldc2 for x86_64.
pegged 0.4.5+commit.5.g913d8dd: building configuration "default"...
python ~master: building configuration "python-test-library"...
../../pegged/peg.d(1842,9): Error: variable `pegged.examples.python.GenericPython!(ParseTree).GenericPython.Python.longest_match!(tuple, group, genexp).ctfeGetNameOr.rule` type `void` is inferred from initializer `tuple`, and variables cannot be of type `void`
../../pegged/peg.d(1842,25): Error: expression `tuple` is `void` and has no value
src/pegged/examples/python.d-mixin-681(6250,196): Error: template instance `pegged.examples.python.GenericPython!(ParseTree).GenericPython.Python.longest_match!(tuple, group, genexp)` error instantiating
src/pegged/examples/python.d-mixin-681(8204,7):        instantiated from here: `GenericPython!(ParseTree)`
src/pegged/examples/python.d-mixin-681(6250,245): Error: template instance `pegged.peg.longest_match!(list, listcomp)` does not match template declaration `longest_match(rules...)(ParseTree p)`
src/pegged/examples/python.d-mixin-681(6250,288): Error: template instance `pegged.peg.longest_match!(dict, set, dictcomp, setcomp)` does not match template declaration `longest_match(rules...)(ParseTree p)`
../../pegged/peg.d(535,20): Error: none of the overloads of `target_with_star_atom` are callable using argument types `(GetName)`
src/pegged/examples/python.d-mixin-681(7218,23):        Candidates are: `pegged.examples.python.GenericPython!(ParseTree).GenericPython.Python.target_with_star_atom(ParseTree p)`
src/pegged/examples/python.d-mixin-681(7237,23):                        `pegged.examples.python.GenericPython!(ParseTree).GenericPython.Python.target_with_star_atom(string s)`
../../pegged/peg.d(1843,21): Error: template instance `pegged.peg.getName!(target_with_star_atom)` error instantiating
src/pegged/examples/python.d-mixin-681(7186,59):        instantiated from here: `longest_match!(and, target_with_star_atom)`
src/pegged/examples/python.d-mixin-681(8204,7):        instantiated from here: `GenericPython!(ParseTree)`
../../pegged/peg.d(535,20): Error: none of the overloads of `disjunction` are callable using argument types `(GetName)`
src/pegged/examples/python.d-mixin-681(5155,23):        Candidates are: `pegged.examples.python.GenericPython!(ParseTree).GenericPython.Python.disjunction(ParseTree p)`
src/pegged/examples/python.d-mixin-681(5174,23):                        `pegged.examples.python.GenericPython!(ParseTree).GenericPython.Python.disjunction(string s)`
../../pegged/peg.d(3134,32): Error: template instance `pegged.peg.getName!(disjunction)` error instantiating
src/pegged/examples/python.d-mixin-681(6790,177):        instantiated from here: `fuse!(disjunction)`
src/pegged/examples/python.d-mixin-681(8204,7):        instantiated from here: `GenericPython!(ParseTree)`
../../pegged/peg.d(535,20): Error: none of the overloads of `for_if_clause` are callable using argument types `(GetName)`
src/pegged/examples/python.d-mixin-681(6786,23):        Candidates are: `pegged.examples.python.GenericPython!(ParseTree).GenericPython.Python.for_if_clause(ParseTree p)`
src/pegged/examples/python.d-mixin-681(6805,23):                        `pegged.examples.python.GenericPython!(ParseTree).GenericPython.Python.for_if_clause(string s)`
src/pegged/examples/python.d-mixin-681(6754,59): Error: template instance `pegged.peg.oneOrMore!(for_if_clause)` error instantiating
src/pegged/examples/python.d-mixin-681(8204,7):        instantiated from here: `GenericPython!(ParseTree)`
../../pegged/peg.d(535,20): Error: none of the overloads of `primary` are callable using argument types `(GetName)`
src/pegged/examples/python.d-mixin-681(6121,23):        Candidates are: `pegged.examples.python.GenericPython!(ParseTree).GenericPython.Python.primary(ParseTree p)`
src/pegged/examples/python.d-mixin-681(6157,23):                        `pegged.examples.python.GenericPython!(ParseTree).GenericPython.Python.primary(string s)`
../../pegged/peg.d(1642,21): Error: template instance `pegged.peg.getName!(primary)` error instantiating
src/pegged/examples/python.d-mixin-681(6097,76):        instantiated from here: `or!(and, primary)`
src/pegged/examples/python.d-mixin-681(8204,7):        instantiated from here: `GenericPython!(ParseTree)`
../../pegged/peg.d(535,20): Error: none of the overloads of `await_primary` are callable using argument types `(GetName)`

@veelo if you have time, can you help to fix these errors? then we can have Python here!

Thanks!

veelo commented 2 years ago

Not sure if I have time to debug this, but a few quick remarks first:

From https://docs.python.org/3/reference/grammar.html:

We use the | separator to mean PEG’s “ordered choice” (written as / in traditional PEG grammars).

Pegged has the | operator, but it is a non-PEG extension meaning "longest match", not "ordered choice". In particular, longest match breaks the linear time guarantee of ordered choice and so makes the parsing vastly more expensive. You should change | to /.

I don't think you can just relicense the grammar from https://docs.python.org/3/license.html#psf-license to Boost.

I don't see a definition for NEWLINE and some other allcaps tokens (DEDENT). Maybe use the endOfLine predefined parser? Have a look at https://github.com/PhilippeSigaud/Pegged/wiki/Predefined-Parsers.

You cannot use D keywords or Phobos identifiers in the grammar. That's probably the cause of your problems (tuple). https://github.com/PhilippeSigaud/Pegged/wiki/Declaring-a-Grammar#rule-names

Hopefully these tips get you a bit closer.

Interesting example by the way, thanks for contributing.

veelo commented 2 years ago

Off-topic: It would be very cool to use this for a Python to D transpiler, so you could do

mixin(`m = "Hello";`.Python.toD); // Mixin Python code.
writeln(m);                       // Use it in D.

as I demonstrate here: https://youtu.be/HvunD0ZJqiA?t=1225

mw66 commented 2 years ago

@veelo thanks for the tip, after fix things you mentioned, and add TYPE_COMMENT, it worked:

dub test
Generating test runner configuration 'python-test-library' for 'library' (library).
Performing "unittest" build using /project/ldc2-1.28.0-linux-x86_64/bin/ldc2 for x86_64.
pegged 0.4.5+commit.10.ge177c5e: building configuration "default"...
python ~master: building configuration "python-test-library"...
Linking...
Running python-test-library 
Python[0, 1]["m"]
 +-Python.file[0, 1]["m"]

1 modules passed unittests

Please take another look of this PR, and merge if you see no more problems.

mw66 commented 2 years ago

I also added Spacing +Comment from ../dgrammar/src/pegged/examples/dgrammar.d,

however I didn't see Spacing is ever used in the RHS of any rule:

$ grep Spacing ../dgrammar/src/pegged/examples/dgrammar.d
Spacing <- (space / Comment)*

So I guess Spacing is also a predefined parser? @veelo can you confirm?

Thanks.

veelo commented 2 years ago

So I guess Spacing is also a predefined parser? @veelo can you confirm?

From https://github.com/PhilippeSigaud/Pegged/wiki/Extended-PEG-Syntax#space-arrow-and-user-defined-spacing:

You can select what pattern is considered as a blank: define a rule called Spacing in your grammar. This is the rule that will be called by Pegged to consume spaces in this grammar, when the < arrow is used. If no user-defined Spacing is provided, Pegged uses the predefined spacing rule that parses blank chars (whitespace, tabulation, etc).

See the link above for more info.

You don't seem to use the < arrow, so any Spacing rule will not be used. Using < would possibly mess with INDENT, which of course would break the semantics of Python. So you'll probably want to remove the Spacing rule.

Are you having plans to use this grammar yourself? In that case I can hold off the merge until you have played with it some more, in case you discover issues. Let me know when you want it to be merged.

mw66 commented 2 years ago

Thanks for the review.

I have removed the Spacing rule.

And I changed NEWLINE to have line comments.

I actually have other plans for this python grammar to experiment some new language struct, so it's better to merge it as soon as you see no obvious issues; then the more users play with it (standard Python), the more likely new issues can be discovered.

veelo commented 2 years ago

Alright, thanks for your contribution!