foonathan / lexy

C++ parsing DSL
https://lexy.foonathan.net
Boost Software License 1.0
991 stars 66 forks source link

location tracking for operators #170

Closed rkaminsk closed 11 months ago

rkaminsk commented 1 year ago

I have a grammar for which I want to track locations, while building the representation. For this, I use the parse state to do the mapping from input iterators to line/column numbers on the fly. This works well but for operators there are some issues (that have workarounds).

  1. The constructor of the tag is passed an iterator from which a location can be computed. My current workaround is to store the iterator in a std::any and do the mapping later. However, I am not a fan of using this type. I add a patch with a proposal to pass the state to the tag constructor, below.
  2. Somehow the iterator passed to the tag is off by one. See the example below reproducing the issue.
  3. I don't like to use lexy::bind and rather use lexy::callback with lambdas. Would it be possible to have a convenience version that also receives the parse state?

Next comes the example code. It's a bit lengthy but self contained and runnable.

#include <iostream>
#include <string>

#include <lexy/action/parse.hpp>
#include <lexy/callback.hpp>
#include <lexy/dsl.hpp>
#include <lexy/input/string_input.hpp>
#include <lexy/input_location.hpp>    

#include <lexy_ext/report_error.hpp>

namespace dsl = lexy::dsl;

struct Position {
    size_t line;
    size_t column;
};

template <class Input, class It> struct State {
    State(Input &input, It begin) : input{input}, begin{begin} {};
    auto pos(It it) {
        // Dummy implementation.
        auto loc = lexy::get_input_location(input, it);
        return Position{loc.line_nr(), loc.column_nr()};
    }
    Input &input;
    It begin;    
};

struct Term {
    Position pos;
    std::string identifier;
};

enum class Type { minus };

struct identifier : lexy::token_production {
    static constexpr auto rule = dsl::position(LEXY_LIT("x"));
    // NOTE: I am actually using a helper to have a callback with state.
    // Would this also make sense for lexy?
    static constexpr auto value = lexy::bind(lexy::callback<Term>([](auto &state, auto it) {
                                                 return Term{state.pos(it), "x"};
                                             }),
                                             lexy::parse_state, lexy::values);
};                           

struct term : lexy::expression_production {
    static constexpr auto atom = dsl::p<identifier>;

    template <Type OP> struct tag_unary {
        // FIXME: according to the documentation it should not be necessary to decrement the iterator
        // NOTE: It would be nice to also support a state argument (requires the patch below).
        template <class State, class It> tag_unary(State &state, It it) : pos{state.pos(std::prev(it))} {}
        static constexpr auto op = OP;
        Position pos;        
    };

    struct op_unary : dsl::prefix_op {
        static constexpr char const *name = "unary";
        static constexpr auto op = dsl::op<tag_unary<Type::minus>>(LEXY_LIT("-"));
        using operand = dsl::atom;
    };

    using operation = op_unary;

    static constexpr auto value = lexy::callback<Term>(lexy::forward<Term>, [](auto tag, Term rhs) {
        rhs.pos = tag.pos;
        rhs.identifier.insert(rhs.identifier.begin(), '-');
        return rhs;
    });
};

struct root {
    static constexpr auto rule = dsl::whitespace(dsl::ascii::space) + dsl::p<term>;
    static constexpr auto value = lexy::forward<Term>;
};

auto main() -> int {
    auto const *str = "   --x";
    auto input = lexy::zstring_input(str);
    auto state = State{input, input.reader().position()};
    auto ret = lexy::parse<root>(input, state, lexy_ext::report_error);
    std::cerr << "input : '" << str << "'" << std::endl;
    if (ret.has_value()) {
        std::cerr << "parsed: '" << ret.value().identifier << "'" << std::endl;
        std::cerr << "line  : " << ret.value().pos.line << std::endl;
        std::cerr << "column: " << ret.value().pos.column << std::endl;
    }
}

Patch proposal to pass the state to the constructor of the operator tag (having generic tags might also be an option):

--- a/include/lexy/dsl/operator.hpp
+++ b/include/lexy/dsl/operator.hpp
@@ -96,6 +96,9 @@ namespace lexyd
 template <typename Tag, typename Reader>
 using _detect_op_tag_ctor = decltype(Tag(LEXY_DECLVAL(Reader).position()));

+template <typename Tag, typename Reader, typename Context>
+using _detect_op_tag_state_ctor = decltype(Tag(*LEXY_DECLVAL(Context).control_block->parse_state, LEXY_DECLVAL(Reader).position()));
+
 template <typename TagType, typename Literal, typename... R>
 struct _op : branch_base
 {
@@ -113,6 +116,9 @@ struct _op : branch_base
             = lexy::whitespace_parser<Context, lexy::parser_for<_seq_impl<R...>, NextParser>>;
         if constexpr (std::is_void_v<TagType>)
             return continuation::parse(context, reader, LEXY_FWD(args)...);
+        else if constexpr (lexy::_detail::is_detected<_detect_op_tag_state_ctor, op_tag_type, Reader, Context>)
+            return continuation::parse(context, reader, LEXY_FWD(args)...,
+                                       op_tag_type(*context.control_block->parse_state, reader.position()));
         else if constexpr (lexy::_detail::is_detected<_detect_op_tag_ctor, op_tag_type, Reader>)
             return continuation::parse(context, reader, LEXY_FWD(args)...,
                                        op_tag_type(reader.position()));
@@ -144,6 +150,9 @@ struct _op : branch_base

             if constexpr (std::is_void_v<TagType>)
                 return impl.template finish<continuation>(context, reader, LEXY_FWD(args)...);
+            else if constexpr (lexy::_detail::is_detected<_detect_op_tag_state_ctor, op_tag_type, Reader, Context>)
+                return impl.template finish<continuation>(context, reader, LEXY_FWD(args)...,
+                                                          op_tag_type(*context.control_block->parse_state, reader.position()));
             else if constexpr (lexy::_detail::is_detected<_detect_op_tag_ctor, op_tag_type, Reader>)
                 return impl.template finish<continuation>(context, reader, LEXY_FWD(args)...,
                                                           op_tag_type(reader.position()));
@@ -165,6 +174,9 @@ struct _op : branch_base
                 = lexy::parser_for<Literal, lexy::parser_for<_seq_impl<R...>, NextParser>>;
             if constexpr (std::is_void_v<TagType>)
                 return continuation::parse(context, reader, LEXY_FWD(args)...);
+            else if constexpr (lexy::_detail::is_detected<_detect_op_tag_state_ctor, op_tag_type, Reader, Context>)
+                return continuation::parse(context, reader, LEXY_FWD(args)...,
+                                           op_tag_type(*context.control_block->parse_state, reader.position()));
             else if constexpr (lexy::_detail::is_detected<_detect_op_tag_ctor, op_tag_type, Reader>)
                 return continuation::parse(context, reader, LEXY_FWD(args)..., op_tag_type(pos));
             else
foonathan commented 1 year ago
  1. The constructor of the tag is passed an iterator from which a location can be computed. My current workaround is to store the iterator in a std::any and do the mapping later. However, I am not a fan of using this type. I add a patch with a proposal to pass the state to the tag constructor, below.

That seems like a sensible addition. If you want, you can turn the patch into a PR, by also adding a test and documentation. (And incorporate the fix for 2.)

  1. Somehow the iterator passed to the tag is off by one. See the example below reproducing the issue.

Fixed.

  1. I don't like to use lexy::bind and rather use lexy::callback with lambdas. Would it be possible to have a convenience version that also receives the parse state?

I have added lexy::callback_with_state for that purpose.

rkaminsk commented 1 year ago

Thanks, for the two additions! I'll open a PR, too. It can take 2 to 3 weeks though.