dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Fail to define custom ParseTree struct #190

Open Domain opened 8 years ago

Domain commented 8 years ago

struct MyParseTree { ParseTree tree; int i; alias tree this; }

alias GenericGrammar!(MyParseTree).Grammar Grammar;

I want to save extra info into the tree. But this code failed to compile.

PhilippeSigaud commented 8 years ago

I just had a look at this issue, and that's going to be complicated. It seems the code diverged somewhat from the original idea to have a grammar be templated on the parse tree type. The generated code mix TParseTree and ParseTree and that's easily solvable, but I had problems with delegates that cannot be coerced into the types I want. Hmm.

The only way forward I see is to rip the dynamic grammar code off (in fact, that may be a good idea in any case) and to inject the peg.d module 'text' inside the generated grammar or to template the entire peg.d module on TParseTree, because right now it's hard-coded with ParseTree. That's going to be difficult.

@veelo: any other idea for this one?

@Domain, it's not a satisfactory answer, but maybe you could use a semantic action and an external struct (or associative array) to record what you want? What other info do you want to get while parsing?

PhilippeSigaud commented 8 years ago

Hold on, by ripping the dynamic part, I could make something work, but I have a problem whith alias this. The following code does not compile:

module bar;

struct Bar { string s; }

struct Bar2 {
  Bar bar;
  int i;
  alias bar this;
}

void main() {
  auto b2 = Bar2("abc");
}

I thought that with alias this I could get Bar2 behave as a Bar? That works with a constructor, of course, but that means that users defining their own parse-tree type will have to provide a constructor to tforward values to the 'standard' ParseTree. That's a bit annoying.

I'll post on the D forum.

veelo commented 8 years ago

I remember reading about extending ParseTree in the docs, and wondering what the use-case might be. I haven't given a thought about how this would be implemented (or not).

Getting rid of dynamic grammar would clean things up, looking forward to that. If you would inject peg.d, I think it would remove all link-time dependencies to Pegged when you use asModule, which might be a bonus?

If this turns out to be difficult or make things unreasonably complicated, a third option is to explicitly not support extending ParseTree. So I am interested in the use-case :-)

Bastiaan.

PhilippeSigaud commented 8 years ago

On Thu, Mar 3, 2016 at 9:54 PM, Bastiaan Veelo notifications@github.com wrote:

I remember reading about extending ParseTree in the docs, and wondering what the use-case might be.

I remember wanting this and templating lots of code, then never using it in real life and ParseTree's crept in the code again. I couldn't find it today in the docs, but I admit not looking for it for a long time.

I haven't given a thought about how this would be implemented (or not).

Getting rid of dynamic grammar would clean things up, looking forward to that.

Yeah.

If you would inject peg.d, I think it would remove all link-time dependencies to Pegged when you use asModule, which might be a bonus?

But I'm afraid that would mean generating thousand of lines of code even for a simple one-line grammar. I find that a bit ridiculous. What if someone define more than one grammar in a module? I'd prefer not to have to inject peg.d in the generated code :-(

Domain commented 8 years ago

I want to save symbol table info into the parsed tree. May be interface will be a simple solution.

veelo commented 8 years ago

I want to save symbol table info into the parsed tree.

@Domain Would it be OK to build the symbol table outside the parse tree? This would probably also be more lightweight. I think that should be possible with semantic actions. When you need to access defined symbols during the parse you can do that with semantic actions too.

As an example, I use semantic actions to make sure that keywords aren't parsed as identifiers, like so:

In make.d (below) I generate my parser using asModule with a header. The semantic action is supplied in the header. I have my grammar definition in epgrammar.d.

/**
 * Recreate the EP parser from the grammar.
 */

import pegged.grammar;
import epgrammar;

void main()
{
    auto header = `
PT failOnWordSymbol(PT)(PT p)
{
    import std.uni: sicmp;
    if (sicmp(p.matches[0], "AND") == 0 ||
        sicmp(p.matches[0], "AND_THEN") == 0 ||
        sicmp(p.matches[0], "ARRAY") == 0 ||
        /* Etcetera... */
        sicmp(p.matches[0], "WHILE") == 0 ||
        sicmp(p.matches[0], "WITH") == 0)
    {
        p.successful = false;
    }
    return p;
}
`;
    asModule!()("epparser", "epparser", EPgrammar, header);
}

The semantic action is used in my grammar like so:

    AnyIdentifier   <~ Letter ( "_"? ( Letter / Digit ) )*
    Identifier      <- AnyIdentifier {failOnWordSymbol}

Identifier is now guaranteed to not be a keyword.

Zardoz89 commented 4 years ago

Any information about the status of this ?

I try doing to define a custom parsetree, so I could evaluate arithmetic expression and get the final value on the node. So, I can't doing with a custom parsetree, I ended doing using a separate parsetree evaluator.

Using a custom parsetree, or at least adding a variant[string] field to store custom data, would be very usefull.

veelo commented 4 years ago

I am not aware of anyone working on this currently.

neilsf commented 5 months ago

I would like to write a fast interpreter that interprets numbers as doubles. I am planning to cache the string to double conversion results for faster execution and this is where storing custom data in the ParseTree struct would be extremely useful - (the OP's int i example would become double i in my case). The other option I'm considering is a lookup table (eg. double[ParseTree*] cache) but my benchmarks show that lookups are not significantly faster than converting from string to double. Any tips how this could be best implemented? Thanks

veelo commented 5 months ago

This would probably fit best in a new issue, but why would you traverse the same ParseTree multiple times? Caching numbers I can understand if the same number can appear many times at different places in the input, but then you would make the cache a double[string]. However, beware that the cost of failed cache lookups may outweigh the saved time on conversions. Typically, you convert the input into a parse tree once, and then convert the parse tree into something useful once.

neilsf commented 5 months ago

why would you traverse the same ParseTree multiple times?

I am designing an interpreter that would keep the ParseTree in memory in runtime and walk through it as the program runs. For example, a for loop would loop through the nodes within a for block. And when the iterator sees a numeric constant the second time through the loop, it wouldn't have to parse the numeric string to a double.

Typically, you convert the input into a parse tree once, and then convert the parse tree into something useful once.

Yes, some sort of byte code for example... It would add a lot more complexity but has huge benefits on the other hand. I'll consider that too.

veelo commented 5 months ago

OK yes a parse tree would not be optimal as the basis for an interpreter. Caching like you described sounds to me as the wrong way to speed things up. Apropos speed: Pegged is not the fastest parser generator out there, yet. Make sure to benchmark before investing too much time.

neilsf commented 5 months ago

Thanks @veelo !