Raku / doc

🦋 Raku documentation
https://docs.raku.org/
Artistic License 2.0
289 stars 291 forks source link

Grammar tutorial should clarify what "ignore whitespace" means #3985

Open flwyd opened 2 years ago

flwyd commented 2 years ago

Problem or new feature

https://docs.raku.org/language/grammar_tutorial describes regex, token, and rule as follows:

It's not clear to me what "Token methods... ignore whitespace" means, and in what way tokens ignore whitespace while rules do not. For example,

'foo' ~~ token { 'f' 'o'+ }  # OUTPUT: 「foo」
'f oo' ~~ token { 'f' 'o'+ }  # OUTPUT: Nil
'foo' ~~ rule { 'f' 'o'+ }  # Nil
'f oo' ~~ rule { 'f' 'o'+ }  # OUTPUT: 「f oo」
'foo' ~~ rule { 'f' 'o'+ }  # Nil
'fo o' ~~ rule { 'f' 'o'+ }  # Nil
'f   o o' ~~ rule { 'f' 'o'+ }  # 「f   o 」

This suggests that tokens do not ignore whitespace; any whitespace between the component parts of a token prevents the token from matching. And in this example, whitespace seems to be mandatory, in that 'foo' doesn't match the rule "f followed by one or more o". (I think the latter is because the default ws token is <!ww> \s* which doesn't match between f and o.)

Suggestions

The "ignore" verb in this context is ambiguous: I think the documentation is saying that whitespace inside the body of the token definition doesn't have any effect. But the first few times I read that section, I thought it was saying that token methods will ignore whitespace inside the string being matched.

One way to clarify this would be something like:

JJ commented 2 years ago

Well, on the surface of it, tokens do ignore whitespace in the expression of the token or rule, as it quite clear since in the first two examples, it's a no-op. However, rules do not, again as quite clear in the other examples. Effectively, it looks like a space matches any amount of whitespace, and that's not there. Your suggestions seem quite reasonable, I encourage you to create a PR to incorporate them.

raiph commented 2 years ago

@flwyd

Yes.

Rule methods behave like token methods, except each run of whitespace in the body

It's not every run of whitespace in the body. See https://stackoverflow.com/questions/48892306/when-is-white-space-really-important-in-perl6-grammars


matches a word boundary and any amount of whitespace.

It's not (necessarily) (just) a word boundary. Instead it's:


The following may be too complicated, but is hopefully at least good food for thought:

rules behave like tokens, except whitespace between elements of the rule's pattern requires a corresponding "break" in the input. By default the break needs to be whitespace or a switch between "word" and non "word" characters. For example, the whitespace between foo and bar in an input string foo bar would be a matching break, and so would the (zero width) character class shift between $ and 100 in $100.