Open flwyd opened 2 years ago
Well, on the surface of it, tokens do ignore whitespace in the expression of the token or rule, as it quite clear since in the first two examples, it's a no-op. However, rule
s do not, again as quite clear in the other examples. Effectively, it looks like a space matches any amount of whitespace, and that's not there. Your suggestions seem quite reasonable, I encourage you to create a PR to incorporate them.
@flwyd
Yes.
Rule methods behave like token methods, except each run of whitespace in the body
It's not every run of whitespace in the body. See https://stackoverflow.com/questions/48892306/when-is-white-space-really-important-in-perl6-grammars
matches a word boundary and any amount of whitespace.
It's not (necessarily) (just) a word boundary. Instead it's:
At a conceptual / abstract level it's just a "tokenizing boundary" with no notion of "word" or "whitespace", where "tokenizing" is about the input string being matched, with no necessary correspondence to a token
.
That said the default is as you describe. Concretely, in Rakudo, it's token ws
declared in Grammar.nqp
.
The following may be too complicated, but is hopefully at least good food for thought:
rule
s behave liketoken
s, except whitespace between elements of therule
's pattern requires a corresponding "break" in the input. By default the break needs to be whitespace or a switch between "word" and non "word" characters. For example, the whitespace betweenfoo
andbar
in an input stringfoo bar
would be a matching break, and so would the (zero width) character class shift between$
and100
in$100
.
Problem or new feature
https://docs.raku.org/language/grammar_tutorial describes
regex
,token
, andrule
as follows:It's not clear to me what "Token methods... ignore whitespace" means, and in what way tokens ignore whitespace while rules do not. For example,
This suggests that tokens do not ignore whitespace; any whitespace between the component parts of a token prevents the token from matching. And in this example, whitespace seems to be mandatory, in that
'foo'
doesn't match the rule "f followed by one or more o". (I think the latter is because the defaultws
token is<!ww> \s*
which doesn't match betweenf
ando
.)Suggestions
The "ignore" verb in this context is ambiguous: I think the documentation is saying that whitespace inside the body of the
token
definition doesn't have any effect. But the first few times I read that section, I thought it was saying thattoken
methods will ignore whitespace inside the string being matched.One way to clarify this would be something like: