Raku / old-design-docs

Raku language design documents
https://design.raku.org/
Artistic License 2.0
124 stars 36 forks source link

Multiple ws rules #96

Open cognominal opened 9 years ago

cognominal commented 9 years ago

sigspace is a great Perl 6 feature but sometimes is too limited. When designing a grammar one may want a sigpace for certain rules and another for other rules. My motivation is a grammar for parsing a .git/config file. Some rules match things within a line while other possibly match with sigspaces that span multiple lines.

I tried different approaches, to no avail. Two of them make me thinks of possible changes to S05.

The first approach was to split the grammar in two with each one having its own ws rules, and one derived from the other. But due to virtual method dispatching the more derived is always pickage. Is virtual method dispatching really appropriate for rules? I have not yet thought it out knowing that a grammar can contain regular method as well.

  grammar Unilines {
      token  ws  { \h*                                                  }
      rule header  {  '[' <id>  ']'                                   }
      rule entry   {  $<nm>=\S+ [ '=' $<val>=\S+ ]?    }
      token id      {   \w+                                             }
      token string {  \" <( [ '\\"' | \V ]* )>  \"                  }
  }

  grammar Config is Unilines {
    token ws {  [ { \h* <[ ;# ]> \N* \n ]+                      }
    rule TOP     {  <section> +                                  }
    rule section {  <header> <entry>  +                   }
 }

I then thought I could use lexical ws methods but their call being implicit, I cannot use <&ws> in rules to get to then. Maybe it should be possible to declare a lexical rule with a trait that indicates that it should be (conceptually) tried before regular method dispatch.

Short of explicit spacing, is there good alternatives I have missed for problems that need multiple sigspaces. I realise that, in this case, the comments don't have to be matches within a sigspace.

FROGGS commented 9 years ago

It might end up looking ugly, but I'd go for tokens only and call the right ws* implementation directly.

Mouq commented 9 years ago

Another option that I've used (I believe in a TOML parser) is to create the class with the inner-level white space inside the outer grammar declaration, the rules of which can be called with <MyInner::token123>, though it may be trickier to get action methods on those.

pmichaud commented 9 years ago

On Thu, Jul 16, 2015 at 03:07:36PM -0700, Stéphane Payrard wrote:

sigspace is a great Perl 6 feature but sometimes is too limited. When designing a grammar one may want a sigpace for certain rules and another for other rules. My motivation is a grammar for parsing a .git/config file. Some rules match things within a line while other possibly match with sigspaces that span multiple lines.

I tried different approaches, to no avail. Two of them make me thinks of possible changes to S05. [...]

Note that S05 allows :sigspace to have an argument specifying a rule to be used instead of the default <.ws>. Perhaps that is more along the lines of what you're looking for?

I don't know if Rakudo implements the argument form of :sigspace yet.

Pm