PhilippeSigaud / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Memoization: potential for optimisation? #176

Open veelo opened 8 years ago

veelo commented 8 years ago

Somewhere in the Wiki you mention that you have tested several lookup tables for memoization and that you have landed on the built-in AAs as the most efficient. I wonder if you have considered the following:

Presently the keys consist of a tuple of a string and a size_t, and there is one hash table for the entire grammar. But for a constant grammar the rule names are known at compile time, so that we could create a separate table for each rule, and the keys would just be size_t. No tuples need to be constructed and destroyed, no strings are involved, the hashing function is probably faster (NoOp?) and the tables are smaller, giving faster lookups.

Would it be worthwhile?

PhilippeSigaud commented 8 years ago

That seems quite a good idea to me. Maybe I was thinking about dynamic grammars at the time (early optimization...). IIRC, I also tried a simple linear look-up and a small hash table I designed (badly, most probably) for this.

So I'd say 'go for it': indeed having one AA per rule, created with the parser seems interesting.

veelo commented 8 years ago

Apropos dynamic grammar. I haven't made an attempt at understanding dynamic grammars, but pegged.dynamic.README.md says it's experimental, yet generated parsers include pegged.dynamic.grammar.d. It might only be used in addRule[Before|After]. Dead code?

PhilippeSigaud commented 8 years ago

On Sun, Jan 31, 2016 at 11:41 PM, Bastiaan Veelo notifications@github.com wrote:

Apropos dynamic grammar. I haven't made an attempt at understanding dynamic grammars, but pegged.dynamic.README.md says it's experimental, yet generated parsers include pegged.dynamic.grammar.d. It might only be used in addRule[Before|After]. Dead code?

Yes. I should drop this addRule thing and cut the dependency on pegged.dynamic.

veelo commented 8 years ago

So I implemented this idea: https://github.com/veelo/Pegged/commit/dd72fd2b107220681784e152e138be842bd259a5

But when timing this, I see no difference. In both implementations parsing of my test case takes around 9.5 ms. Without memoization it takes around 21 ms. So maybe my case isn't stressing the memoization table enough. This makes the parser quite a bit more verbose, so I'll leave this alone for now and maybe come back later to test with a larger input.

You can play with it if you like: https://github.com/veelo/Pegged branch memoTables.

PhilippeSigaud commented 8 years ago

I'll have a look at it, thanks. And I agree with you that there is no need to add complexity if there is no visible gain in speed.