boostorg / spirit

Boost.org spirit module
http://boost.org/libs/spirit
392 stars 161 forks source link

X3 3.0.10 error_handler where() is wrong #712

Open Bockeman opened 2 years ago

Bockeman commented 2 years ago

The position returned by the error_handler where() points to the last successful parse point, not the beginning of the failure. Hence the error messaging is not very meaningful and does not match the expected results in the tutorial.

Using the code from https://www.boost.org/doc/libs/1_78_0/libs/spirit/doc/x3/html/spirit_x3/tutorials/error_handling.html

Using spirit X3 3.0.8, and in the tutorial documentation:

  -------------------------
  Now we have some errors
  In line 16:
  Failed: Expecting: person here:
    'I am not a person!'  <--- this should be a person
  __^_
  -------------------------
  Parsing failed
  -------------------------

Using spirit X3 3.0.10:

  -------------------------
  Now we have some errors
  In line 15:
  Failed: Expecting: person here:
    43,
  _____^_
  -------------------------
  Parsing failed
  -------------------------
djowel commented 2 years ago

This is a breaking change if the behavior changed. @Kojoley, do you have an idea which commit this happened? This will break a lot of code.

Kojoley commented 2 years ago

@Kojoley, do you have an idea which commit this happened?

Most likely, removal of whitespace trimming (https://github.com/boostorg/spirit/pull/670).

This will break a lot of code.

I don't think so, it is a diagnostic message for humans.

There are fundamental design choices that are conflicting: 1) sequence parser rolls back the iterator 2) error handler does not have a context to grab a skipper to skip whitespaces, but even if it had, that would be wrong because of no_skip existence,

I checked the idea from #704

roll back the iterator only in alternative parser + (some?) repeat/container parsers

and it does solve the issue for x3::error_handler.

djowel commented 2 years ago

OK, well, at least fix the examples, so they show how this behavior is 'properly' implemented and provide warnings of the breaking change in the comments, perhaps.

Bockeman commented 2 years ago

The change in behaviour of the sequence parser also affects other examples which rely on positions.position_of() (e.g annotation.cpp). I think this needs some careful thought. I personally do not see any value of any position_of pointing to white space, and I would suggest that the position_of, by default, automatically advances (skips) past white space. However, I can see that there could be situations where whitespace is a genuine syntax error, so a means to override the default behaviour would be desirable.

I am using position_of() not just for human interpretation situations, like error_handler, but also for transformation from one ast to another (equivalent to translating from one coding language to another). I can add whitespace skipping, in all places where used, but it would be so much better if this was incorporated into X3 as default (i,e as in X3 3.0.8).

@djowel is there a set of policy statatements or design goals that could make it easier to make certain design choices like this? For instance, if X3 is aimed at making usage easy and intuitive for the majority of users, but with additional capability for handling complex cases, then I would argue that the default behaviour of position_of and error_handler is to skip whitespace. It is not the whitespace that is useful/interesting/wrong/illegal/erroneous but the character(s) that follow (in the majority of cases). (Though, strictly, I agree that pointing just past the last successful parse is perhaps correct, it just is not helpful.)

djowel commented 2 years ago

@djowel is there a set of policy statatements or design goals that could make it easier to make certain design choices like this?

In retrospect, I think this change is a mistake. To be honest, I am lost on the details about the rationale for this change, looking at #670 and #704. I still think this will break a lot of code that relies on it. @Kojoley has his reasons. I am hoping he could explain more about the justifications for the change, esp. WRT "an idea about rich and low overhead error reporting" mentioned in #704.

Kojoley commented 2 years ago

The change in behaviour of the sequence parser

What change? You have not reported that and I am not aware of behavior changes in the sequence parser.

affects other examples which rely on positions.position_of() (e.g annotation.cpp).

Hmm, this one is caused by the change to rules, though the same issue applies - because the actual skipping is done in primitive parsers it is not possible to redo the skipping correctly in rules due to no_skip directive existence. Maybe we can getaway from this situation by "banning" no_skip directive as the first parser of the rule, though I remember @/syyyr recently used it that way for unknown for me reason here https://github.com/CESNET/netconf-cli/blob/4841f1369d323ddfdc6cc7fc9a10e2a1b5603fda/src/path_parser.hpp#L386-L400

I personally do not see any value of any position_of pointing to white space, and I would suggest that the position_of, by default, automatically advances (skips) past white space. ​However, I can see that there could be situations where whitespace is a genuine syntax error, so a means to override the default behaviour would be desirable.

As you may see in #546 report, different people have different needs and blindly skipping whitespaces is not working for everyone. However, in fixing #546 there was no goal to change the behavior for the example. I do try my best to check examples to not regress, but I mainly rely on the test suit which cannot catch everything because it is also manually populated either when I spot something to be undertested or when I fix something.

You can make you own handler with wanted behavior and use it even now, either based on x3::error_handler/annotate_on_success (which are only 200 lines of code) or completely from scratch. Of course any improvement to the provided handlers for general usage are welcome. If you need an immediate solution I think you would have already simply grabbed the handler from the previous version and used it.

OK, well, at least fix the examples, so they show how this behavior is 'properly' implemented and provide warnings of the breaking change in the comments, perhaps.

I am working on preparing changes needed to fix error_handling.cpp example for the review, please don't rush me.

Kojoley commented 2 years ago

@syyyr I am sorry for disturbing you. May I ask you to tell how is important for you to be able to use no_skip directive at the beginning of a rule which is done here https://github.com/CESNET/netconf-cli/blob/4841f1369d323ddfdc6cc7fc9a10e2a1b5603fda/src/path_parser.hpp#L386-L400, could it be refactored if a static assert is added for such a use?

syyyr commented 2 years ago

I don't think it is very important to me, I suppose I could refactor with some sort of a dummy rule to the left of the no_skip? The only real reason that I need no_skip there is because a long time ago, I decided I'm going to use phrase_parse. I've been thinking of refactoring my code to use x3::parse (and use skip directives where needed), because that's what suits my program better.

tl;dr: No, it is not important to me.

djowel commented 2 years ago

FWIW, I have full confidence in @Kojoley. I just want to make sure the changes do not cause any confusion and lots of headache in having to readjust code that now silently behaves differently. Foremost, the examples should be fixed.

Kojoley commented 2 years ago

@syyyr most logical refactor is to lift no_skip a step upper in the grammar, to the place where the rule currently containing it is used, and where it is also visible that you are disabling preskipping.

Bockeman commented 2 years ago

An aside: this is my bad: @Kojoley wrote

There are fundamental design choices that are conflicting: 1) sequence parser rolls back the iterator 2) error handler does not have a context to grab a skipper to skip whitespaces, but even if it had, that would be wrong because of no_skip existence,

Which I incorrectly interpretted as a possible explanation for the observed change in behaviour of error_handler and position_of in my response:

The change in behaviour of the sequence parser ...

Sorry. Please disregard this statement.

Bockeman commented 2 years ago

@djowel wrote:

FWIW, I have full confidence in @Kojoley. I just want to make sure the changes do not cause any confusion and lots of headache in having to readjust code that now silently behaves differently. Foremost, the examples should be fixed.

@Kojoley you also have my backing.

I just think that resolving "fundamental design choices that are conflicting" is a particularly difficult challenge which I do not envy. I offer any help I can to help resolve this issue.

  1. I have found that resolving such issues (fundamental design conflicts) is often made easier when the overall aims/objectives are clear [or further clarified to help in that situation]. Hence my question to @djowel above.
  2. Please don't feel you need to follow the requests of "whoever shouts loudest" (myself included), instead take your time to assess the fundamentals and devise a solution that suits the majority and is in line with the overall objectives. Hence, I acknowledge your request "Please don't rush me".
  3. I am not desperate for a solution. You correctly guessed "If you need an immediate solution I think you would have already simply grabbed the handler from the previous version and used it". I have worked around the changes in behaviour by making the minimum of changes necessary, like patching in the previous skip_whitespace() into the error_handler, and changing all my code affected by the change in behaviour of position_of() by inserting equivalents to skip_whitespace(). But my workarounds are ugly, and I would, eventually, like a more elegant solution.
  4. Whichever way you choose to resolve this issue, I'd be happy to test any proposals (git clone from branch, if necessary) and, importantly, to review any changes to the examples/documentation. I am (and always have been) motivated to make things easy for those that follow: to build on and improve over anything I might ever acheive. I am passionate about open source and the value of community driven continuous improvement (as here), and hate corporate egotism.
Kojoley commented 2 years ago

I'd be happy to test any proposals (git clone from branch, if necessary) and, importantly, to review any changes to the examples/documentation.

I would appreciate if you test a solution prototype https://github.com/Kojoley/spirit/tree/x3-remove-roll-back-from-sequence-parser, the branch should fix both issues you have reported.

Bockeman commented 2 years ago

@Kojoley I have started to look at your solution prototype and have come across some things which I was not expecting.

The first is in the tutorial complex_number.cpp

/////////////////////////////////////////////////////////

                A complex number micro parser for Spirit...

/////////////////////////////////////////////////////////

Give me a complex number of the form r or (r) or (r,i) 
Type [q or Q] to quit

(1,)
-------------------------
Parsing succeeded
got: (1,0)

-------------------------
q
Bye... :-) 

In previous versions, the sequence "(1,)" was a parse failure. If I change the sequence operator ">>" to the expectation operator ">", then that fails. Looking at the grammar, I think this case should be a failure.

FYI the output of my regression test suite is:

  "1"                                               parses ok (as expected)  (1,0)
  " 1"                                              parses ok (as expected)  (1,0)
  "1 "                                              parses ok (as expected)  (1,0)
  " 1 "                                             parses ok (as expected)  (1,0)
  "(1)"                                             parses ok (as expected)  (1,0)
  " (1)"                                            parses ok (as expected)  (1,0)
  "( 1)"                                            parses ok (as expected)  (1,0)
  " ( 1 ) "                                         parses ok (as expected)  (1,0)
  "(1,2)"                                           parses ok (as expected)  (1,2)
  "(1.,2)"                                          parses ok (as expected)  (1,2)
  "(1,2.)"                                          parses ok (as expected)  (1,2)
  "(.1,2)"                                          parses ok (as expected)  (0.1,2)
  "(1,.2)"                                          parses ok (as expected)  (1,0.2)
  "(1.23,45.67)"                                    parses ok (as expected)  (1.23,45.67)
  "(13,-64.567)"                                    parses ok (as expected)  (13,-64.567)
  " (1,2)"                                          parses ok (as expected)  (1,2)
  "( 1,2)"                                          parses ok (as expected)  (1,2)
  "(1 ,2)"                                          parses ok (as expected)  (1,2)
  "(1, 2)"                                          parses ok (as expected)  (1,2)
  "(1,2 )"                                          parses ok (as expected)  (1,2)
  "()"                                              fails (as expected)
  "(,2)"                                            fails (as expected)
  "(1,)"                                            parses ok (error: should fail)  (1,0)   <-----
  "("                                               fails (as expected)
  "(1,a)"                                           fails (as expected)
Kojoley commented 2 years ago

Thanks for the catch, I pushed the fix into the branch.

Bockeman commented 2 years ago

@Kojoley your latest push fixes the complex_number.cpp and related cases. Thanks.

Bockeman commented 2 years ago

I have another example of some unexpected behaviour.

This is based on code you supplied when exploring #679

// Original from Nikita using boost::variant
#include <boost/spirit/home/x3.hpp>
#include <boost/variant.hpp>
#include <vector>
#include <boost/spirit/home/x3/version.hpp>
namespace x3 = boost::spirit::x3;

#define USE_SIMPLER_CASE_FOR_VERSIONS_AFTER_0x309

int main()
{
  char const* s     = "abaabb", * e     = s + std::strlen(s);
  using Qaz = std::vector<boost::variant<int>>;
  using Foo = std::vector<boost::variant<Qaz, int>>;
  using Bar = std::vector<boost::variant<Foo, int>>;
  Bar x;

  #if SPIRIT_X3_VERSION > 0x3009

    #ifndef USE_SIMPLER_CASE_FOR_VERSIONS_AFTER_0x309

      // After the fix to  bug #679 the parser can now handle nested variants:
      x3::parse(s, e, +('a' >> x3::attr(Foo{}) | 'b' >> x3::attr(int{})), x);

    #else

      // But produces a compile time error for the simpler case (that previous versions could handle)
      x3::parse(s, e, +('a' >> x3::attr(Qaz{}) | 'b' >> x3::attr(int{})), x);

    #endif

  #else

    // Due to bug #679 the parser could not handle nested variants, but it could handle this simpler case:
    x3::parse(s, e, +('a' >> x3::attr(Qaz{}) | 'b' >> x3::attr(int{})), x);

  #endif
}

which produces the following error at compile time:

Compile rgw29_nikita_variant.cpp
g++ -c -MMD -O0 -Wall -Werror   -std=c++17 -I./src -o /Temp/intermediate/bobw/play/example/spirit/dev/rgw29_nikita_variant.o src/rgw29_nikita_variant.cpp
In file included from /usr/include/boost/spirit/home/x3/auxiliary/any_parser.hpp:17,
                 from /usr/include/boost/spirit/home/x3/auxiliary.hpp:11,
                 from /usr/include/boost/spirit/home/x3.hpp:62,
                 from src/rgw29_nikita_variant.cpp:5:
/usr/include/boost/spirit/home/x3/support/traits/move_to.hpp: In instantiation of ??void boost::spirit::x3::traits::detail::move_to(Source&, Dest&, boost::spirit::x3::traits::variant_attribute, mpl_::false_) [with Source = std::vector<boost::variant<int> >; Dest = boost::variant<std::vector<boost::variant<std::vector<boost::variant<int> >, int> >, int>; mpl_::false_ = mpl_::bool_<false>]??:
/usr/include/boost/spirit/home/x3/support/traits/move_to.hpp:152:20:   required from ??void boost::spirit::x3::traits::detail::move_to(Source&, Dest&, boost::spirit::x3::traits::variant_attribute) [with Source = std::vector<boost::variant<int> >; Dest = boost::variant<std::vector<boost::variant<std::vector<boost::variant<int> >, int> >, int>]??
/usr/include/boost/spirit/home/x3/support/traits/move_to.hpp:196:24:   required from ??void boost::spirit::x3::traits::move_to(Source&&, Dest&) [with Source = std::vector<boost::variant<int> >&; Dest = boost::variant<std::vector<boost::variant<std::vector<boost::variant<int> >, int> >, int>]??
/usr/include/boost/spirit/home/x3/operator/detail/alternative.hpp:175:28:   required from ??static void boost::spirit::x3::detail::move_if<true>::call(T1&, T2&) [with T1 = std::vector<boost::variant<int> >; T2 = boost::variant<std::vector<boost::variant<std::vector<boost::variant<int> >, int> >, int>]??
/usr/include/boost/spirit/home/x3/operator/detail/alternative.hpp:191:70:   required from ??bool boost::spirit::x3::detail::parse_alternative(const Parser&, Iterator&, const Iterator&, const Context&, RContext&, Attribute&) [with Parser = boost::spirit::x3::sequence<boost::spirit::x3::literal_char<boost::spirit::char_encoding::standard, boost::spirit::x3::unused_type>, boost::spirit::x3::attr_parser<std::vector<boost::variant<int> > > >; Iterator = const char*; Context = boost::spirit::x3::unused_type; RContext = const boost::spirit::x3::unused_type; Attribute = boost::variant<std::vector<boost::variant<std::vector<boost::variant<int> >, int> >, int>]??
/usr/include/boost/spirit/home/x3/operator/detail/alternative.hpp:209:45:   required from ??bool boost::spirit::x3::detail::alternative_helper<Subject>::parse(Iterator&, const Iterator&, const Context&, RContext&, Attribute&) const [with Iterator = const char*; Context = boost::spirit::x3::unused_type; RContext = const boost::spirit::x3::unused_type; Attribute = boost::variant<std::vector<boost::variant<std::vector<boost::variant<int> >, int> >, int>; Subject = boost::spirit::x3::sequence<boost::spirit::x3::literal_char<boost::spirit::char_encoding::standard, boost::spirit::x3::unused_type>, boost::spirit::x3::attr_parser<std::vector<boost::variant<int> > > >]??
/usr/include/boost/spirit/home/x3/core/detail/parse_into_container.hpp:97:30:   [ skipping 7 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
/usr/include/boost/spirit/home/x3/operator/detail/alternative.hpp:246:24:   required from ??static bool boost::spirit::x3::detail::parse_into_container_impl<boost::spirit::x3::alternative<L, R>, Context, RContext>::call(const parser_type&, Iterator&, const Iterator&, const Context&, RContext&, Attribute&) [with Iterator = const char*; Attribute = std::vector<boost::variant<std::vector<boost::variant<std::vector<boost::variant<int> >, int> >, int> >; Left = boost::spirit::x3::sequence<boost::spirit::x3::literal_char<boost::spirit::char_encoding::standard, boost::spirit::x3::unused_type>, boost::spirit::x3::attr_parser<std::vector<boost::variant<int> > > >; Right = boost::spirit::x3::sequence<boost::spirit::x3::literal_char<boost::spirit::char_encoding::standard, boost::spirit::x3::unused_type>, boost::spirit::x3::attr_parser<int> >; Context = boost::spirit::x3::unused_type; RContext = const boost::spirit::x3::unused_type; boost::spirit::x3::detail::parse_into_container_impl<boost::spirit::x3::alternative<L, R>, Context, RContext>::parser_type = boost::spirit::x3::alternative<boost::spirit::x3::sequence<boost::spirit::x3::literal_char<boost::spirit::char_encoding::standard, boost::spirit::x3::unused_type>, boost::spirit::x3::attr_parser<std::vector<boost::variant<int> > > >, boost::spirit::x3::sequence<boost::spirit::x3::literal_char<boost::spirit::char_encoding::standard, boost::spirit::x3::unused_type>, boost::spirit::x3::attr_parser<int> > >]??
...

I am still not very good at interpretting error messages of this ilk.

What is going on, and is this what you would expect?

Kojoley commented 2 years ago

I have another example of some unexpected behaviour.

This is based on code you supplied when exploring #679

...

I am still not very good at interpretting error messages of this ilk.

What is going on, and is this what you would expect?

The changes in the branch cannot cause compile error, they may only introduce runtime bugs. The code doesn't compile before the changes https://godbolt.org/z/M5KMsnobG and I don't think it should because x.emplace_back(Qaz{}); won't compile either.

Bockeman commented 2 years ago

I'm puzzled.

The link https://godbolt.org/z/M5KMsnobG points to the code I supplied above, and I see that you get the same compile time error.

You said

I don't think it should [compile] because x.emplace_back(Qaz{}); won't compile either.

But the code I supplied does compile on a machine with these versions:

egrep '#define (BOOST|SPIRIT)_(|X3_)VERSION ' /usr/include/boost{,/spirit/{include,home/x3}}/version.hpp
/usr/include/boost/version.hpp:#define BOOST_VERSION 107600
/usr/include/boost/spirit/include/version.hpp:#define SPIRIT_VERSION 0x2058
/usr/include/boost/spirit/home/x3/version.hpp:#define SPIRIT_X3_VERSION 0x3008

2022-01-19 16:26
Compile rgw29_nikita_variant.cpp
g++ -c -MMD -O0 -Wall -Werror   -std=c++17 -I./src -o /Temp/intermediate/bobw/play/example/spirit/dev/rgw29_nikita_variant.o src/rgw29_nikita_variant.cpp

2022-01-19 16:26
Link rgw29_nikita_variant.o
g++ /Temp/intermediate/bobw/play/example/spirit/dev/rgw29_nikita_variant.o \
   \
  /Temp/intermediate/bobw/play/example/spirit/dev/rgw29_yacc_grammar.o /Temp/intermediate/bobw/play/example/spirit/dev/rgw29_yacc_trial.o \
    -lboost_regex -lboost_filesystem -lboost_system -o bin/x86_64-Linux-5.15/rgw29_nikita_variant 2>&1 | c++filt

2022-01-19 16:27
bin/rgw29_nikita_variant installed

2022-01-19 16:27
time bin/rgw29_nikita_variant
0.00user 0.00system 0:00.01elapsed 43%CPU (0avgtext+0avgdata 5740maxresident)k
0inputs+0outputs (6major+451minor)pagefaults 0swaps

I'm not familiar with emplace_back() and I cannot find such a function anywhere under boost/spirit. So your "because" does not provide me with a rational explanation.

This line of code does compile for 0x3008 but not for 0x3010. x3::parse(s, e, +('a' >> x3::attr(Qaz{}) | 'b' >> x3::attr(int{})), x);

I am not saying that this failure to compile has anything to do with the code in your fixes for #679 and this #712, but something has been introduced between 0x3008 and 0x3010 that causes the failure.

I am unable to interpret the compile time error messaging. I do not see how changing x3::attr(Foo{}) to x3::attr(Qaz{}) could cause a compile error (in 0x3010 ) when the latter is simpler and Qaz is the same type (std::vector<boost::variant<...>>) as Foo.

Please could you help explain what is going on here.

Kojoley commented 2 years ago

But the code I supplied does compile on a machine with these versions:

It only confirms that it is not related to the changes we are discussing here.

I'm not familiar with emplace_back() and I cannot find such a function anywhere under boost/spirit. So your "because" does not provide me with a rational explanation.

  using Qaz = std::vector<boost::variant<int>>;
  using Foo = std::vector<boost::variant<Qaz, int>>;
  using Bar = std::vector<boost::variant<Foo, int>>;
  Bar x;
  x.emplace_back(Qaz{});
Bockeman commented 2 years ago

Oh, (slightly embarrassed), emplace_back() is a std::vector method; nothing to do with spirit. I agree, that example should not compile.

It leaves a potentially awkward question of why

This line of code does compile for 0x3008 but not for 0x3010. x3::parse(s, e, +('a' >> x3::attr(Qaz{}) | 'b' >> x3::attr(int{})), x);

ever did compile. But that is now fixed, and not relevant to this ticket.

Bockeman commented 2 years ago

I am still trawling through all of my code, unwinding various workarounds.

This code does not compile.

#include <boost/spirit/home/x3.hpp>
#include <vector>
namespace x3 = boost::spirit::x3;
int main()
{
  char const* s     = "abaabbcc", *e = s + std::strlen(s);
  using Foo     = std::vector<char>;
  using Bar     = std::vector<Foo>;
  Bar x;
  parse(s, e, +((x3::char_ > x3::char_) > (x3::char_ > x3::char_)), x);
}

I think it should complie (as in I cannot see why it should not). Is this related, or should a separate issue be raised?

Kojoley commented 2 years ago

File a new issue for it (not a regression and not related to this one).

Bockeman commented 2 years ago

@Kojoley I have now gone through as much of my code as I care to at the present time and unwound various workarounds for this issue. My code is a whole lot cleaner now. Thanks. I have raised separate issues for the remaining few issues I face.

From a purely coding perspective, I am happy with your fix.

I'd be happy to test any proposals (git clone from branch, if necessary) and, importantly, to review any changes to the examples/documentation.

I would appreciate if you test a solution prototype https://github.com/Kojoley/spirit/tree/x3-remove-roll-back-from-sequence-parser, the branch should fix both issues you have reported.

Please could you point me at the documentation changes and the "version history" inserts that you intend to apply, so that I can also review those.

ibis-hdl commented 2 years ago

I run into the same issue, probably. My test cases fail with diagnostic message for humans (line_start and position self (shows a space on begin of token)). Is there a workaround or similar? What is the state? I'm using boost 1.78