boostorg / spirit

Boost.org spirit module
http://boost.org/libs/spirit
392 stars 161 forks source link

X3: seek parser goes past the end of input #658

Closed Kojoley closed 3 years ago

Kojoley commented 3 years ago
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
int main() {
    char const* x = " ";
    x3::phrase_parse(x, x+1, x3::seek['!'], x3::space);
}

from https://sourceforge.net/p/spirit/mailman/spirit-general/?viewmonth=202004

seek seems to be missing the fact that a parser can advance an iterator (are they allowed to actually a good question), what happens all the time with skippers.

The https://github.com/boostorg/spirit/pull/30 seems to be responsible for this regression, and I have no idea what it was fixing.

vtnerd commented 3 years ago

The change permitted seeking to end of input. Reverting #30 no longer permits this case because the subject will never return true. And shouldn't the subject check for end of input first, or is it always assumed? I don't remember the semantics now.

vtnerd commented 3 years ago

Oh nevermind, I see the problem. Yes, re-implement as a do-while actually.

Kojoley commented 3 years ago

The change permitted seeking to end of input.

There are tests for that from the beginning, even before this PR: https://github.com/boostorg/spirit/blob/e8b4b10d1e47c1b1b7fe8efee88fdadbd56e6af8/test/x3/seek.cpp#L34-L37

Could you please show an example of what you talking about maybe?

vtnerd commented 3 years ago

The code should still have UB, there's no return at the end. This error was copied from the original qi ("repository") code. return true is insufficient without checking the subject first and return false is insufficient for the eoi of case. The tests likely pass because of the UB. I think a do-while is what I was trying to do here.

vtnerd commented 3 years ago

Ah, wait, forever loop, damn.

vtnerd commented 3 years ago

Ok, now I remember. Its in the comment. I'll add a test but it may not be immediate.

The problem is seek[eol | eoi]. If the input string is \n, it matches eol, hits the forever loop and increments, then does an eoi correctly but misses the last check since its one-past the last character.

And how does the change create a one past end of input case? That happened in the subject ?

Kojoley commented 3 years ago

Ok, now I remember. Its in the comment. I'll add a test but it may not be immediate.

The problem is seek[eol | eoi]. If the input string is \n, it matches eol, hits the forever loop and increments, then does an eoi correctly but misses the last check since its one-past the last character.

Could you please write a reproducer? This work just fine:

#include <boost/spirit/home/x3.hpp>

namespace x3 = boost::spirit::x3;

int main()
{
    char const* s = "\n", * e = s + std::strlen(s);
    bool r = parse(s, e, x3::seek[x3::eol | x3::eoi]);
    if (r) std::cout << (s == e ? "Match\n" : "Partly parsed:" + std::string(s, e) + "\n");
    else std::cout << "Fail\n";
}

As I already said I tried to reproduce an issue without any success.

And how does the change create a one past end of input case? That happened in the subject ?

Quoting myself from the first post:

#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;
int main() {
    char const* x = " ";
    x3::phrase_parse(x, x+1, x3::seek['!'], x3::space);
}

from sourceforge.net/p/spirit/mailman/spirit-general/?viewmonth=202004

seek seems to be missing the fact that a parser can advance an iterator (are they allowed to actually a good question), what happens all the time with skippers.