dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Grammar tester #79

Closed chadjoan closed 12 years ago

chadjoan commented 12 years ago

I wrote a way to unittest grammars by drawing the tree structure that parses should have.

I realize that I'm probably failing to update the documentation properly. In this pull request and the previous one I've made changes to the doc directory, but it doesn't seem to reflect on the wiki very easily. I'm not sure how to make documentation changes actualize themselves.

Also noteworthy: when I merged the most recent changes into my branch, I got this unittest failure: core.exception.AssertError@pegged.peg(805): unittest failure I believe the unittests in for the grammar tester still pass: I isolated it by temporarily reverting commit 31d7565491798027244f10abcd0bb46576a4fa85 and was able to make all tests pass.

Let me know what you think!

PhilippeSigaud commented 12 years ago

I wrote a way to unittest grammars by drawing the tree structure that parses should have.

That's a good idea.

I realize that I'm probably failing to update the documentation properly. In this pull request and the previous one I've made changes to the doc directory, but it doesn't seem to reflect on the wiki very easily. I'm not sure how to make documentation changes actualize themselves.

It's not automatic. IIRC, there is a way to make github understand the wiki is part of the project, but I didn't find the time to set this up. So, up to now, I try to update the wiki manually every time I change the docs.

This means of course that direct online change to the wiki are not reflected in the /docs directory, nor is any change made by someone else pull request.

Also noteworthy: when I merged the most recent changes into my branch, I got this unittest failure: core.exception.AssertError@pegged.peg(805): unittest failure

This should be gone now.

Let me know what you think!

I'll merge it as soon as github work again for me. I don't know if that was the Frankenstorm or my vacation place, but the site was almost unreachable for the past 2-3 days.

This idea is very interesting, so I'll be sure to review it and merge it ASAP! Just ping me again in a few days if you did not see any thing new.

I'm also trying to have github admit you as a member of the Pegged project, so that you can push directly on the master directory. I want to do the same for callumenator, but as I said, github is down for me right now.

PhilippeSigaud commented 12 years ago

OK, I read it and I get it. I'll merge it. You'll have to explain the ~> arrow to me again, though :)

chadjoan commented 12 years ago

I'll take a look at the ~> doc at some point and see what I can do.

When I found a need for this I was writing a grammar for a D compiler I want to try and write. The grammar is for a kind of pattern-matching DSL that operates on ASTs and has regular-expression-like semantics but with more identifiers and use of vertical text space. The intent is to have a good notation for describing the lowering action of a compiler's semantic analysis: after staring at enough of that code I decided that massively nested if-while-else-for-if-for-while-etc constructs are just a very poor notation for lowerings because it doesn't reflect my mental model of how that stuff works. Anyhow: I needed a way to test the grammar so that I can slowly grow it without wreaking havoc on previous progress. Hence, this grammar testing thing was born. This long intro has a point: I actually gained some good ideas by trying to make this testing grammar very simple while minimizing nesting levels. Those ideas will probably feed back into the pattern-matching DSL and make its grammar more sensible. I'd actually like to try and make the DSL be a strict superset of the testing grammar, at least on a syntactic level (semantically, divergence is practically inevitable).

Another thing worth noting: you'll probably want to avoid using the GrammarTester in the unittests for the Pegged grammar and internals itself. If the internals can't pass their tests, then the results of the GrammarTester become very questionable. It would form a kind of circular dependency with negative implications. I figured you might realize this already, but I wanted to mention it just in case. I'll probably comment this in the unittests once I've more sleep in me ;)

Thank you for the write access. I'll do my best to respect it.

PhilippeSigaud commented 12 years ago

I'll take a look at the ~> doc at some point and see what I can do.

I think I get what it does by reading the docs, but I'm still a bit confused.

When I found a need for this I was writing a grammar for a D compiler I want to try and write. The grammar is for a kind of pattern-matching DSL that operates on ASTs and has regular-expression-like semantics but with more identifiers and use of vertical text space. The intent is to have a good notation for describing the lowering action of a compiler's semantic analysis: after staring at enough of that code I decided that massively nested if-while-else-for-if-for-while-etc constructs are just a very poor notation for lowerings because it doesn't reflect my mental model of how that stuff works. Anyhow: I needed a way to test the grammar so that I can slowly grow it without wreaking havoc on previous progress. Hence, this grammar testing thing was born. This long intro has a point: I actually gained some good ideas by trying to make this testing grammar very simple while minimizing nesting levels. Those ideas will probably feed back into the pattern-matching DSL and m ake its grammar more sensible. I'd actually like to try and make the DSL be a strict superset of the testing grammar, at least on a syntactic level (semantically, divergence is practically inevitable).

That's pretty interesting. You could have a look at the XL language. It's an entire language defined as AST macros. on itself.

Another thing worth noting: you'll probably want to avoid using the GrammarTester in the unittests for the Pegged grammar and internals itself. If the internals can't pass their tests, then the results of the GrammarTester become very questionable. It would form a kind of circular dependency with negative implications. I figured you might realize this already, but I wanted to mention it just in case. I'll probably comment this in the unittests once I've more sleep in me ;)

Yeah, metacircular dependency.

Thank you for the write access. I'll do my best to respect it.

I'm curious to see what it will do and I was frustrated not to be able to merge pull requests for a few days. I'll be away across all Europe in the coming weeks and that way the project can still live on. I'm just afraid most of us will have small subfeatures that none of the others use. Alexrp gave me waf make files and I do not use them for now (neither do I use make) and I fear they will rot.

What about having tagged versions? Maybe we should make that an issue.

chadjoan commented 12 years ago

I'll take a look at the ~> doc at some point and see what I can do.

I think I get what it does by reading the docs, but I'm still a bit confused.

I changed the docs a bit the other day. This was alongside a change to the tester grammar.

The ~> arrow is a fuzzier version of ->. In the case where there is only one child pointed to, it means that there may be other children that aren't explicitly pointed to by an arrow. The -> arrow is stricter: it means there may be only one child: the one pointed to.

I am now using the arrows to distinguish between ordered/unordered sets of children. This eliminates the need to use [] as a nesting construct, which is good for any languages that would want to be a superset of the tester grammar and is generally more future-proof.

That's pretty interesting. You could have a look at the XL language. It's an entire language defined as AST macros. on itself.

I had a look. Good mention! I don't think I've encountered that one before.

It seems to be very incomplete though. When I looked at the compiler, it was still all written in C++. Although the author seems to have a lot of tests, I still didn't get to see how his language would look when implementing something like the lowering steps in the semantic analysis of a compiler.

It also seems to be fragmented between XLR and XL2: why not harmonize the functional programming variant and the imperative programming variant? Despite the original split between functional and imperative programmers, and I think that D and others have shown that functional and imperative styles can harmonize quite well. Oh well, small side issue since I was mostly interested in learning about its metaprogramming.

I still don't feel like there is enough XL code around to get a feel for how its metaprogramming works; at least not without spending large amounts of effort pouring over test cases, prodding the compiler, and generally trying to read the author's mind.

... Alexrp gave me waf make files and I do not use them for now (neither do I use make) and I fear they will rot.

Since Pegged doesn't seem to have much of a build system to begin with, it might be a decent idea to try it out. I will eventually want to compile this beast incrementally myself ;)

As for tagging versions:

Sorry for the late response: I've been a bit busy the past week or so.

PhilippeSigaud commented 12 years ago

I am now using the arrows to distinguish between ordered/unordered sets of children. This eliminates the need to use [] as a nesting construct, which is good for any languages that would want to be a superset of the tester grammar and is generally more future-proof.

Strange. I must miss something, because my preference would be to write trees as nested []'s and {}'s pairs.

That's pretty interesting. You could have a look at the XL language. It's an entire language defined as AST macros. on itself.

I had a look. Good mention! I don't think I've encountered that one before.

It seems to be very incomplete though. When I looked at the compiler, it was still all written in C++. Although the author seems to have a lot of tests, I still didn't get to see how his language would look when implementing something like the lowering steps in the semantic analysis of a compiler.

I don't think he got very far, but the idea is interesting.

[Tags]

  • If you're referring to the need to keep the documentation submodule in sync with the main, then I suspect it isn't required. The documentation isn't required for successful compilation, so if someone were to pull the main repository and never bother to pull the doc submodule in, then they'd still be able to compile and use Pegged. It is probably important to keep that property though: I wouldn't put any .d files in the doc submodule.
  • Otherwise: I've probably misunderstood what you meant.

I meant pushing tags like 'v0.3' for a certain commit, so as to 'mark' some commits as releases.

Sorry for the late response: I've been a bit busy the past week or so.

I've been moving around in different countries and will continue to do so for at least a week. So coding is a but slower :)