Open Domain opened 8 years ago
I just had a look at this issue, and that's going to be complicated. It seems the code diverged somewhat from the original idea to have a grammar be templated on the parse tree type. The generated code mix TParseTree and ParseTree and that's easily solvable, but I had problems with delegates that cannot be coerced into the types I want. Hmm.
The only way forward I see is to rip the dynamic grammar code off (in fact,
that may be a good idea in any case) and to inject the peg.d module 'text'
inside the generated grammar or to template the entire peg.d module on
TParseTree
, because right now it's hard-coded with ParseTree
.
That's going to be difficult.
@veelo: any other idea for this one?
@Domain, it's not a satisfactory answer, but maybe you could use a semantic action and an external struct (or associative array) to record what you want? What other info do you want to get while parsing?
Hold on, by ripping the dynamic part, I could make something work, but I have a problem whith alias this. The following code does not compile:
module bar;
struct Bar { string s; }
struct Bar2 {
Bar bar;
int i;
alias bar this;
}
void main() {
auto b2 = Bar2("abc");
}
I thought that with alias this
I could get Bar2 behave as a Bar?
That works with a constructor, of course, but that means that users
defining their own parse-tree type will have to provide a constructor to
tforward values to the 'standard' ParseTree. That's a bit annoying.
I'll post on the D forum.
I remember reading about extending ParseTree
in the docs, and wondering what the use-case might be. I haven't given a thought about how this would be implemented (or not).
Getting rid of dynamic grammar would clean things up, looking forward to that. If you would inject peg.d
, I think it would remove all link-time dependencies to Pegged when you use asModule
, which might be a bonus?
If this turns out to be difficult or make things unreasonably complicated, a third option is to explicitly not support extending ParseTree
. So I am interested in the use-case :-)
Bastiaan.
On Thu, Mar 3, 2016 at 9:54 PM, Bastiaan Veelo notifications@github.com wrote:
I remember reading about extending ParseTree in the docs, and wondering what the use-case might be.
I remember wanting this and templating lots of code, then never using it in real life and ParseTree's crept in the code again. I couldn't find it today in the docs, but I admit not looking for it for a long time.
I haven't given a thought about how this would be implemented (or not).
Getting rid of dynamic grammar would clean things up, looking forward to that.
Yeah.
If you would inject peg.d, I think it would remove all link-time dependencies to Pegged when you use asModule, which might be a bonus?
But I'm afraid that would mean generating thousand of lines of code even for a simple one-line grammar. I find that a bit ridiculous. What if someone define more than one grammar in a module? I'd prefer not to have to inject peg.d in the generated code :-(
I want to save symbol table info into the parsed tree. May be interface will be a simple solution.
I want to save symbol table info into the parsed tree.
@Domain Would it be OK to build the symbol table outside the parse tree? This would probably also be more lightweight. I think that should be possible with semantic actions. When you need to access defined symbols during the parse you can do that with semantic actions too.
As an example, I use semantic actions to make sure that keywords aren't parsed as identifiers, like so:
In make.d
(below) I generate my parser using asModule
with a header. The semantic action is supplied in the header. I have my grammar definition in epgrammar.d
.
/**
* Recreate the EP parser from the grammar.
*/
import pegged.grammar;
import epgrammar;
void main()
{
auto header = `
PT failOnWordSymbol(PT)(PT p)
{
import std.uni: sicmp;
if (sicmp(p.matches[0], "AND") == 0 ||
sicmp(p.matches[0], "AND_THEN") == 0 ||
sicmp(p.matches[0], "ARRAY") == 0 ||
/* Etcetera... */
sicmp(p.matches[0], "WHILE") == 0 ||
sicmp(p.matches[0], "WITH") == 0)
{
p.successful = false;
}
return p;
}
`;
asModule!()("epparser", "epparser", EPgrammar, header);
}
The semantic action is used in my grammar like so:
AnyIdentifier <~ Letter ( "_"? ( Letter / Digit ) )*
Identifier <- AnyIdentifier {failOnWordSymbol}
Identifier is now guaranteed to not be a keyword.
Any information about the status of this ?
I try doing to define a custom parsetree, so I could evaluate arithmetic expression and get the final value on the node. So, I can't doing with a custom parsetree, I ended doing using a separate parsetree evaluator.
Using a custom parsetree, or at least adding a variant[string] field to store custom data, would be very usefull.
I am not aware of anyone working on this currently.
I would like to write a fast interpreter that interprets numbers as doubles. I am planning to cache the string to double conversion results for faster execution and this is where storing custom data in the ParseTree
struct would be extremely useful - (the OP's int i
example would become double i
in my case). The other option I'm considering is a lookup table (eg. double[ParseTree*] cache
) but my benchmarks show that lookups are not significantly faster than converting from string to double. Any tips how this could be best implemented? Thanks
This would probably fit best in a new issue, but why would you traverse the same ParseTree
multiple times? Caching numbers I can understand if the same number can appear many times at different places in the input, but then you would make the cache a double[string]
. However, beware that the cost of failed cache lookups may outweigh the saved time on conversions. Typically, you convert the input into a parse tree once, and then convert the parse tree into something useful once.
why would you traverse the same
ParseTree
multiple times?
I am designing an interpreter that would keep the ParseTree
in memory in runtime and walk through it as the program runs. For example, a for
loop would loop through the nodes within a for block. And when the iterator sees a numeric constant the second time through the loop, it wouldn't have to parse the numeric string to a double.
Typically, you convert the input into a parse tree once, and then convert the parse tree into something useful once.
Yes, some sort of byte code for example... It would add a lot more complexity but has huge benefits on the other hand. I'll consider that too.
OK yes a parse tree would not be optimal as the basis for an interpreter. Caching like you described sounds to me as the wrong way to speed things up. Apropos speed: Pegged is not the fastest parser generator out there, yet. Make sure to benchmark before investing too much time.
Thanks @veelo !
struct MyParseTree { ParseTree tree; int i; alias tree this; }
alias GenericGrammar!(MyParseTree).Grammar Grammar;
I want to save extra info into the tree. But this code failed to compile.