dlang-community / Pegged

A Parsing Expression Grammar (PEG) module, using the D programming language.
534 stars 66 forks source link

Will there be a speed-up when using a Dlang switch-case and shorter pegged variable names? #319

Closed enjoysmath closed 1 year ago

enjoysmath commented 1 year ago

Something like this on the grammar side:

mixin(grammar(`
   NN_Bool:
      Trm <- If / Suc / Prd / Is0 / V
      If  < "if" Trm "then" Trm "else" Trm
      Suc < "succ" V
      Prd < "pred" V
      Is0  < "iszero" V
      V <- T / F / Z
      T <- "true"
      F <- "false"
      Z <- "0"
`));

I.e. I used "Trm" instead of "Term", "If" instead of "IfThenElse" etc. Won't this speed up at all a Dlang switch implementation when it comes to interpreting the parse tree?:

module interpreter;

import language;
import pegged.peg;
import std.array;

class Interpreter
{
public:
   Object evaluate(string s)
   {
      auto p = NN_Bool(s);      // string to ParseTree, courtesy of pegged library
      return _evaluate(p);
   }

   Object _evaluate(ParseTree p)
   {
      import std.stdio;

      auto name_parts = p.name.split(".");
      auto name = name_parts[$-1];           // Get name of interest out of long dotted name

      switch (name) {
         case "Trm":
            return _evaluate(p.children[0]);

         case "If":
            writeln(p.children);
            break;

         case "NN_Bool":
            return _evaluate(p.children[0]);

         default:
            break;
      }

      return null;
   }
}
veelo commented 1 year ago

Not likely. This is an interesting general D question, and it could be worth asking on the learn forum.

In principle, string switches have the potential to be just as fast as enum switches, if the compiler is able to compare pointers to strings in static memory. That is not always the case, and I don't think Pegged grammars is one of them, sadly. (It would be nice to experiment with generating parsers that use an enum instead of strings.) So if a jump table is not possible, a large if () else if () else … is generated, in which strings are compared.

String comparison can fail early: as soon as two characters differ, the strings cannot be equal. So if you have 24 sentences that all start with a different letter of the alphabet, every one of them can be differentiated by comparing just the first letter.

In your short example this is already the case, so I would expect that you will see no change in speed whatsoever.

If you have a very large grammar and are looking for ways to speed things up, you should first see if there is any left recursion in it (as that cannot be fully memoized). If you have a lot of time on your hands you could try modifying Pegged to generate enum based grammars; I would do it if I had the time.

enjoysmath commented 1 year ago

So in essence, I should just use my long-name version for readability?

On Sun, Oct 30, 2022 at 3:01 PM Bastiaan Veelo @.***> wrote:

Not likely. This is an interesting general D question, and it could be worth asking on the learn forum.

In principle, string switches have the potential to be just as fast as enum switches, if the compiler is able to compare pointers to strings in static memory. That is not always the case, and I don't think Pegged grammars is one of them, sadly. (It would be nice to experiment with generating parsers that use an enum instead of strings.) So if a jump table is not possible, a large if () else if () else … is generated, in which strings are compared.

String comparison can fail early: as soon as two characters differ, the strings cannot be equal. So if you have 24 sentences that all start with a different letter of the alphabet, every one of them can be differentiated by comparing just the first letter.

In your short example this is already the case, so I would expect that you will see no change in speed whatsoever.

If you have a very large grammar and are looking for ways to speed things up, you should first see if there is any left recursion in it (as that cannot be fully memoized). If you have a lot of time on your hands you could try modifying Pegged to generate enum based grammars; I would do it if I had the time.

— Reply to this email directly, view it on GitHub https://github.com/PhilippeSigaud/Pegged/issues/319#issuecomment-1296360152, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMIF56UU6SKEHQ5ZQBM3GLWF3O4BANCNFSM6AAAAAARSMRMMI . You are receiving this because you authored the thread.Message ID: @.***>