boostorg / spirit

Boost.org spirit module
http://boost.org/libs/spirit
383 stars 159 forks source link

X3: Backtrack and container attribute #378

Open Kojoley opened 6 years ago

Kojoley commented 6 years ago

Preface

The side effect of backtracking on container type attributes is a known Spirit problem. It is not only a very surprising thing for newcomers, but also the sword of Damocles to everyone.

Click for the backtrack side effect example ```cpp #include #include #include #include "../spirit/test/x3/test.hpp" int main() { using spirit_test::test_attr; namespace x3 = boost::spirit::x3; { std::string s; BOOST_TEST(test_attr("ac", x3::char_("a") >> x3::char_("b") | x3::string("ac"), s)); BOOST_TEST_EQ("ac", s); } return boost::report_errors(); } ```

Solutions

The two main suggestions for Qi users are:

  1. Rewrite your parser to eliminate backtracking, or if it is impossible:
  2. Use hold directive.

At the current moment there is no hold directive in X3 and that's a problem. The simple solution is to introduce hold directive and forget about the problem, however I think it is not a solution, but a dirty hack/workaround.

What I have in my mind right now:

no description pseudo representation
1 hold directive. hold[a] | hold[b] | hold[c] | ... | hold[z]
2 parse_alternative to temporary. Same as 1
3 alternative::parse to temporary. hold...[hold[hold[a | b] | c] | ...] | z
4 clear the attribute before parse_alternative call. clear >> a | clear >> b | ... | clear >> z
5 clear the attribute after parse_alternative fail. a | clear | b | clear | ... | z | clear
6 clear between parse_alternative calls in alternative::parse. a | clear >> b | ... | clear >> z

What I think about solutions above:

  1. Easy to implement. User should know about its existence. Error prone or terrible looking because of verbosity if used in every branch. The cost is N construct + N - F move + N destruct calls, where N is number of hit hold directives, F is number of failed branches.
  2. Easy to implement. The cost is S construct + S - F move + S destruct calls, where S is number of succeeded branches, F is number of failed branches.
  3. The cost is S - 1 construct + S - F move + S - 1 destruct calls, where S is number of succeeded branches, F is number of failed branches.
  4. Requires new trait. The cost is S clear trait calls, where S is number of succeeded branches.
  5. Requires new trait. The cost is F clear trait calls, where F is number of failed branches.
  6. Requires new trait. The cost is F - 1 clear trait calls, where F is number of failed branches.

Since all approaches have constant overhead after backtracking that had appended zero elements to the attribute the list above shows that hold directive has most performance impact, and being a 'non-automatic' solution makes it the worst decision by a large margin.

Non-container attributes handling will not changed even on asm level for all the solutions (except may be hold directive, because Qi documentation explicitly states that it creates temporary and does not make exception for non-container attributes).

Note: solutions 1-3 will make more allocate calls and perform worse on non reusing allocators, but 4-6 may end with more reserved memory (can be fixed with shrink_to_fit call, but I think it should be left up to a user to make such call).

From the solutions list it is clear that:

Optimizations

The main question is: can we make automatic backtrack recovering as optimized as the most optimized hand placed hold directive? Let's see:

Solution 6 is already optimized to clear only between branches so it is equivalent (in terms of the result) of hold[a] | hold[b] | hold[c] | ... | z. It is nice, we do not need to clear the container if the last branch has failed.

Side note: Actually, clear is cheaper than holds in this case. Because it does the following a | clear >> b | clear >> c | ... | z there are no constructions and destructions of container type in every branch, but what most people may not realize -- memory allocations. The most used container is std::vector, it will call allocator on push_back due to growth (and may be even multiple allocations + moves or if you unclucky -- copies for its elements) on every branch (at least a single allocation on every branch that produces at least one element). This will not happen with solution 6 because we always work with the same container there are no moves/copies and the number of allocations will be as much as number of allocations for only the longest successful branch.

The two possible manual hold optimization are:

  1. a | hold[b] | c - hold is not used on a because it does not produce an attribute (example a = lit("foo")). It is a simple optimization and will be performed automatically when is_same<unused_type, attribute_of<decltype(a)>>.
  2. lit("foo") >> hold[a] | b - postpone hold so if lit("foo") fails there is no cost of hold at all. For the most of containers clear call on an empty attribute should be very cheap so this optimization will have close to zero impact on performace. However, it also can be done automatically by performing something like this transformation lit("foo") >> (a | clear) | b. It is a complicated optimization, but I have a clear picture of how it could be done by injecting a tag into a context.

I can't say for sure there are no other possible optimizations, but it should be something very tricky.

Kojoley commented 6 years ago

I do not know how hold was introduced into Qi and why that solution had been chosen. I feel like even Qi must lose hold, because it (even on C++11 I think) instead of moves performs copies. During deprecation period it will become no-op with a compile time warning of being removed in future.

mjcaisse commented 6 years ago

Qi can't loose hold. Please do not change Qi.

Kojoley commented 6 years ago

Since I did not even open a PR you may have misunderstood me. Or maybe you have a counterexample?

djowel commented 6 years ago

Correct me if my understanding is wrong, but I think one difference between 'hold' and the alternative solutions is that with 'hold' you only pay for it when you use it. The others, you pay for the cost all the time. (edit: to be clear, i meant 2 and 3. not sure about the implications of 4, 5 and 6 yet)

mjcaisse commented 6 years ago

Hi @Kojoley -

Perhaps I misunderstood. You write:

I feel like even Qi must lose hold, because it (even on C++11 I think) instead of moves performs copies. During deprecation period it will become no-op with a compile time warning of being removed in future.

Which makes it sound like you are proposing that hold is also removed from Qi. That would be either a breaking change or a performance hit.

I'm simply saying that hold should not be removed from Qi .

As for X3, I'm not opposed to a hold directive. I personally like the explicit nature of imposing the cost.

Suggestions 2 and 3 are expensive.

I think #4 changes behaviour for existing code. This may be a breaking change for some people acting tricky with attribute synthesis.

5 May add some cost but might be negligible.

6 Has same behaviour changes as #4 I think.

I wonder if there is a way to shift the annotation (such as hold) out of the grammar and into the attribute where it belongs. Like a wrapper for the attribute that indicates that it should be "reset" on failed parse (or something like that).

Kojoley commented 6 years ago

No, actually I have showed above that hold on general usage is way more expensive (even if we assume zero cost move) as it requires heap allocations. I will try to describe more precisely.

On general usage hold not 'when you use it', because if it is not used on branch with container attribute - it is a bug in your parser.

The two possible manual hold optimization:

  1. a | hold[b] | c - we did not use hold on a because it does not produce an attribute (in short a = eps)
  2. lit("foo") >> hold[a] | b - we postpone hold so if lit("foo") fails there is no cost of hold at all.

Maybe I missed some other pros and then things may change, but currently what I see:

  1. We will make this optimization automatically since we can test is_same<unused_type, attribute_of<decltype(a)>>
  2. clear call on an empty attribute should very cheap. If someone really needs this kind of optimization we can leave hold and disable automatic clear call for this branch.
djowel commented 6 years ago

I wonder if there is a way to shift the annotation (such as hold) out of the grammar and into the attribute where it belongs. Like a wrapper for the attribute that indicates that it should be "reset" on failed parse (or something like that).

I liked that idea in essence, but I think the issue whether to 'hold' or not is a consequence of the grammar, not the attribute. A clean, non-backtracking grammar need not have to 'hold'.

Kojoley commented 6 years ago

That would be a either a breaking change

If it will remain a no-op or optimization guide it will not. Otherwise I have said a deprecation cycle so users will have time to migrate their code without a compilation errors.

or a performance hit.

As I have shown above it will be performance gain.

Suggestions 2 and 3 are expensive.

I just shown all the solutions I have in mind. What I really propose want to propose is 6. I do not see any further ways to optimize except the hold guide for partial match I have described in post above.

I wonder if there is a way to shift the annotation (such as hold) out of the grammar and into the attribute where it belongs. Like a wrapper for the attribute that indicates that it should be "reset" on failed parse (or something like that).

It is complicated and I do not see examples where this will be better that solution 6. (I do not talk of downsides like hold has about its verbosity).

Kojoley commented 6 years ago

And you may also specialize empty trait so empty call on your type will be no-op.

mjcaisse commented 6 years ago

It is not a performance gain if you are not using hold. I think you are comparing always using hold to the new solutions. I've almost never used hold.

djowel commented 6 years ago

I've almost never used hold.

Me too. Whatever the solution is, we should not pay for something that we do not use.

Kojoley commented 6 years ago

I think you are comparing always using hold to the new solutions.

You misunderstood me. The behavior of alternate parser for variant types and values of non-container types will be not changed. I have even wrote it in the OP:

Non-container attributes handling will not changed even on asm level for all the solutions

I've almost never used hold.

It is expected, because hold is needed at places where the attribute is not of a variant, but a value of a container type.

djowel commented 6 years ago

Non-container attributes handling will not changed even on asm level for all the solutions

Interesting! Indeed, we might be misunderstanding. Could you expound a bit more?

Kojoley commented 6 years ago

You simply do not need to hold/clear a non-container attribute because it will be rewritten in any other successful branch entirely. So call empty_if_container will become no-op on non-container attributes and will cost nothing for you. At compile time it will cost you an instantiation of templated function and is_container trait per unique attribute type (should be at max a few milliseconds).

djowel commented 6 years ago

You simply do not need to hold/clear a non-container attribute because it will be rewritten in any other successful branch entirely.

Not sure I understand. 'hold' is explicit. And typically, you do it with containers. If the user has 'hold' on non-container attributes, then he asked for it.

Kojoley commented 6 years ago

Not sure I understand. 'hold' is explicit. And typically, you do it with containers. If the user has 'hold' on non-container attributes, then he asked for it.

Yes, he may have asked this, but he actually may misused it. Without semantic actions it does not change parsing result at all. With semantic action it will produce the same result as by replacing it with a pair of parenthesis.

Kojoley commented 6 years ago

Yes, he may have asked this, but he actually may misused it. Without semantic actions it does not change parsing result at all.

This is also apply to a container types.

With semantic action it will produce the same result as by replacing it with a pair of parenthesis.

This will change result on container types if user asked pass = false while using hold on last branch (like this a | ... | hold[z], what is some really strange usage). It also may change behavior for semantic actions placed on hold directive that reads _val (again some strange usage, someone does something tricky here). But we still may make a runtime debug check for this during deprecation period if Qi is going to remove hold, but it is an optional thing.

The main subject of the discussion is X3. The Qi currently for me is a ground of possible counterexamples and only if there are none of them I will propose a backport to Qi.

djowel commented 6 years ago

The main subject of the discussion is X3. The Qi currently for me is a ground of possible counterexamples and only if there are none of them I will propose a backport to Qi.

Agreed. We should leave Qi as-is. Changing Qi's behavior now will cause havoc. Deprecation does not practically work for Boost, based on my experience. For example, Spirit "classic" has been deprecated for more than 10 years now, yet it persists because, people use it and boost libraries use it. removing it now will cause an uproar :-)

I am 100% for a well tuned and optimized "hold" for X3. I am also open to other, possibly parallel solutions, as long as you don't pay for it when you do not need it (one of the guiding principles of Spirit).

Kojoley commented 6 years ago

For example, Spirit "classic" has been deprecated for more than 10 years now, yet it persists because, people use it and boost libraries use it. removing it now will cause an uproar :-)

It does not produce a compiler warning. There is no mention of deprecation in the documentation, actually. People also are lazy, and will not touch things if they are working and producing satisfactory results. But it is not free for you, because you pay by maintaining it for them.

I am 100% for a well tuned and optimized "hold" for X3. I am also open to other, possibly parallel solutions, as long as you don't pay for it when you do not need it (one of the guiding principles of Spirit).

That is core thing of C++, otherwise I would have used Haskell :-)

djowel commented 6 years ago

It does not produce a compiler warning. There is no mention of deprecation in the documentation, actually. People also are lazy, and will not touch things if they are working and producing satisfactory results. But it is not free for you, because you pay by maintaining it for them.

There was, actually. And it was documented very well. We had a warning going for years when it was deprecated. We removed it later, because we simply cannot remove "classic" and force people to switch to Qi.

djowel commented 6 years ago

I am 100% for a well tuned and optimized "hold" for X3. I am also open to other, possibly parallel solutions, as long as you don't pay for it when you do not need it (one of the guiding principles of Spirit).

That is core thing of C++, otherwise I would have used Haskell :-)

👍

Kojoley commented 6 years ago

Ok, it looks like hold manual optimization 2 could be done automatically, so I do not see not a single advantage of manual hold placing.

Update to first post:

However, there is a thing that makes me wonder:

#include <boost/detail/lightweight_test.hpp>
#include <boost/spirit/home/qi.hpp>
#include <string>
#include "../spirit/test/qi/test.hpp"

int main()
{
    using spirit_test::test_attr;
    namespace qi = boost::spirit::qi;

    {
        std::string s = "z";
        BOOST_TEST(test_attr("ac", qi::hold[qi::char_("a") >> qi::char_("b")] | qi::char_("a") >> qi::char_("c"), s));
        BOOST_TEST_EQ("ac", s);
    }

    return boost::report_errors();
}
main.cpp(101): test '"ac" == s' failed in function 'int __cdecl main(void)': '000000013FBEAD64' != 'zac'
1 error detected.

Why hold appends its items? From the documentation I see there was no such intention

It instantiates a new attribute instance for the embedded parser. The value of that attribute instance is copied to the outer attribute if the embedded parser succeeds and it is discarded otherwise.

The code not just instantiates a new attribute, but make a copy of the current attribute: https://github.com/boostorg/spirit/blob/9a8cded55355644c749d6511504b59c048d64de3/include/boost/spirit/home/qi/directive/hold.hpp#L61

Kojoley commented 6 years ago

Well, I have no comments here...

#include <boost/detail/lightweight_test.hpp>
#include <boost/spirit/home/x3.hpp>
#include <string>
#include "../spirit/test/x3/test.hpp"

int main()
{
    using spirit_test::test_attr;
    namespace x3 = boost::spirit::x3;

    {
        std::string s = "z";
        auto a = x3::rule<struct _a, std::string>{} = x3::char_("a") >> x3::char_("b");
        auto b = x3::rule<struct _b, std::string>{} = x3::char_("a") >> x3::char_("c");
        BOOST_TEST(test_attr("ac", a | b, s));
        BOOST_TEST_EQ("ac", s);
    }

    return boost::report_errors();
}
main.cpp(145): test '"ac" == s' failed in function 'int __cdecl main(void)': '000000013F487724' != 'zac'
1 error detected.
Xeverous commented 1 year ago

Is this problem resolved? Based on the fact that all linked issues are closed and the last comment by @Kojoley shows the cleanest implementation in practice (attr copy but clear on failed alternative) I assume everything has been fixed. Correct?

Kojoley commented 1 year ago

Is this problem resolved? Based on the fact that all linked issues are closed and the last comment by @Kojoley shows the cleanest implementation in practice (attr copy but clear on failed alternative) I assume everything has been fixed. Correct?

No :-( Nothing changed here, Spirit still does not rollback container attributes. Though, I had an idea how to solve it simply and cheaply: introduce a container rollback trait which for a vector-like containers just stores current size and upon a rollback request resizes it to the saved size. The solution adds a new requirement for parsers and semantic actions to not remove elements past the savepoint (or simply to only add elements, never remove) which does not seem like it will break a lot of users, but no one can say for sure.

Xeverous commented 1 year ago

No :-( Nothing changed here, Spirit still does not rollback container attributes.

Why? Wasn't solution 6 (clear on alternative failure when parsing into a container) the best choice?

The solution adds a new requirement for parsers and semantic actions to not remove elements past the savepoint

IMO semantic actions have no place in X3. The library already offers a mechanism for hooking code into attributes (annotate_on_success in examples) and by their design, semantic actions are prone to breaking something. The docs already discourage their use in X3.