foonathan / lexy

C++ parsing DSL
https://lexy.foonathan.net
Boost Software License 1.0
1.01k stars 67 forks source link

Parsing C++ expression statements #137

Closed osdeverr closed 1 year ago

osdeverr commented 1 year ago

I'm having trouble getting Lexy to fully parse C++ expression statements (like printf('hello world");) as part of the generic "statement" production.

All the other kinds of statements, like variable declarations and flow control statements, are significantly easier to deal with: as they all include some sort of prefix like if or auto at their beginning, their rules easily become branch rules and are matched with a simple dsl::p<decl_statement> | dsl::p<if_block> | /* ... */.

Since expression statements are not prefixed by anything and my gigantic expression production is for some reason not a branch, adding them to the OR chain above fails, a rule like dsl::p<expression> >> dsl::semicolon fails to compile, and dsl::p<expression> + dsl::semicolon is by itself not a branch rule in the first place.

Code (with unnecessary parts omitted since it's already quite long): https://gist.github.com/osdeverr/2ad4a1f92cfa0321b70d3ff8f586ac7b

My questions are:

  1. How could I parse an expression production with a trailing semicolon, AND have it be a branch rule?
  2. Am I even doing it all correctly in the attached snippet?

If you have any further questions about the attached snippet, feel free to ask them.

Thanks in advance for answering, and thanks for making this library! Even though I'm having some issues with it, overall it's very useful.

osdeverr commented 1 year ago

UPDATE: I've managed to get it to (kind of) work with a dsl::lookahead condition:

    struct expression_statement
    {
        static constexpr auto rule = [] {
            auto key_condition = dsl::lookahead(dsl::semicolon, dsl::lit_c<'}'>));
            return key_condition >> dsl::terminator(dsl::semicolon)(dsl::p<expression>);
        }();

        static constexpr auto value = lexy::callback<ast_node_ptr>(
            [](ast_node_ptr &&expr) { return std::make_unique<ast::expression_statement>(std::move(expr)); });
    };

However, this matches the closing brace even in things like (for example) string literals. An expression like fmt::print("Your choice: {}\n", choice); will not parse properly.

Is there any way to somehow skip the lookahead up until the expression itself has ended?