foonathan / lexy

C++ parsing DSL
https://lexy.foonathan.net
Boost Software License 1.0
994 stars 67 forks source link

Ignore whitespace at beginning of input #45

Closed jannikw closed 2 years ago

jannikw commented 2 years ago

I am trying to ignore whitespace between my tokens/productions. Adding a member like

static constexpr auto whitespace
        = dsl::ascii::space | LEXY_LIT("//") >> dsl::until(dsl::eol);

only skips whitespace if it comes after the first actual token in the input. Meaning if the input start with a few blank lines or a comment for example, the parser fails. This is somewhat unexpected as the docs state:

Use whitespace to skip optional whitespace at the beginning of the input.

My work around is replacing the entry production's rule like this:

static constexpr auto rule //
        = dsl::whitespace(whitespace) + (old rule here);

This makes it work the way i want: Ignoring whitespace at the beginning of the input, but I think this should not be necessary if whitespace skipping was working correctly.

Playground for reproducing: https://lexy.foonathan.net/playground/?id=Pzo5jx88o&mode=trace

foonathan commented 2 years ago

You can simplify it by using dsl::whitespace (without an argument), this will skip whitespace according to the current whitespace rule.

This behavior is not really a bug, but is a pitfall that should be documented better: automatic whitespace skipping only skips whitespace after a token, which doesn't handle initial whitespace (as a bonus, you'll get whitespace skipping after EOF though... :D).

Maybe it is a good change to skip whitespace in the initial production as well? But how would you disable that then?

jannikw commented 2 years ago

Ah thank you, now that you say that, I see that the tip

Use whitespace to skip optional whitespace at the beginning of the input.

is placed below dsl::whitespace. So it refers to using dsl::whitespace in the rule, but when I read it, I though of it referring to the whitespace member of the production.

I think the current design of having to place dsl::whitespace in the entry production is ok. All you are saying seems to be in the docs and even in the example for the json parser, I just missed it :D