boostorg / thread

Boost.org thread module
http://boost.org/libs/thread
197 stars 162 forks source link

fix possible crash when running a boost::thread within a boost::thread #367

Closed elsamuko closed 2 years ago

elsamuko commented 2 years ago

When running a boost::thread within a boost::thread, the inner thread can crash when calling join, when it's joined itself. See: https://github.com/boostorg/thread/issues/366 https://stackoverflow.com/q/71482828

elsamuko commented 2 years ago

@pdimov ping

sehe commented 2 years ago

I'd expect at join_noexcept to handle the interruption exceptions instead of blocking interruption.

The problem was that exceptions emanated from a (conceptually) noexcept operation, and therefore from destructors that used it.

It seems that suppressing interruption might suppress that kind of exception, but I'd imagine it could result in deadlocks if a thread was modeled to never return unless interrupted.

@elsamuko Any chance you can check whether this alternative approach removes your problem?

pdimov commented 2 years ago

How would these functions handle the interruption exception?

elsamuko commented 2 years ago

@sehe Wouldn't this be practically the same, if I disable the interruption locally or catch boost::thread_interrupted?

elsamuko commented 2 years ago

@sehe But it works, too (based on this sample):

        // boost::this_thread::disable_interruption di;
        t.interrupt();
        try { t.join(); } catch( boost::thread_interrupted& ) {}
pdimov commented 2 years ago

It just does the same thing, less efficiently.

sehe commented 2 years ago

How would these functions handle the interruption exception?

@pdimov Again, not sure. Perhaps boost::thread_interrupted can be handled by ignoring it. Mmm. When I feared potential deadlock because the inner thread would never join, I forgot that disable_interruption applies to the joining thread, not the thread being joined, obviously.

However, boost::thread::join is specified as an interruption point. Disabling interruption in join_noexcept seems to break that?

The more I think about this, the more I wonder whether join_noexcept was badly named (specifically meaning it won't check preconditions), and should raise thread_interruption if only to satisfy the interruption_point criteria. (Notice how the overload isn't marked noexcept either, independent of language support, while other functions are).

And by consequence, the best fix would be to handle (ignore) said exception when in a destructor - such as the thread_guard destructor.

pdimov commented 2 years ago

Well, a noexcept function can't be an interruption point, obviously. And I can't look up what the docs say because boost.org is broken. :-)

sehe commented 2 years ago

Hah. My DNS/cache wins. The function isn't noexcept.

join() is an interruption point and its implementation unconditionally calls join_noexcept(). Because the only extra checks in join() perform precondition checks, it follows that join_noexcept() must be an interruption point.

Hence my conclusion that (A) join_noexcept is unfortunately named (B) we have to plug the hole where it exists: join() from a destructor.

elsamuko commented 2 years ago

Doc link: https://github.com/boostorg/thread/blob/develop/doc/thread_ref.qbk#L345

pdimov commented 2 years ago

join is definitely an interruption point; the question was whether join_noexcept was intended to be one.

sehe commented 2 years ago

I tried to answer that. Were you able to verify my reasoning?

pdimov commented 2 years ago

Yes I did. thread::join is here and it throws on deadlock (a thread joining itself) and on precondition violations (thread not joinable). It calls join_noexcept, which necessarily implies that join_noexcept is an interruption point. This makes this PR invalid, because it removes the interruptibility of join.

The fix for the issue must be found elsewhere.

sehe commented 2 years ago

The fix for the issue must be found elsewhere.

I'll draw up a fix adding exception handling to the thread_guard destructor. I can't think of better ways. Of course, I can only make it squelch interruption exceptions, because otherwise we risk leaving programs in invalid states for unknown reasons.

elsamuko commented 2 years ago

Another possiblity would be adding the non_interruptable_interrupt_and_join_if_joinable to the list of thread functors.
If someones is not using a thread_guard, then it's still crashing.
And join_noexcept would still throw.

pdimov commented 2 years ago

Destructors need disable_interruption. Which of our destructors has the problem? ~thread_guard?

sehe commented 2 years ago

Adding it to the thread functors makes little sense to me, as they're useful outside of destructors.

I PR-ed the alternative approach here: https://github.com/boostorg/thread/pull/369

elsamuko commented 2 years ago

I summarized all testcases so far here: https://github.com/elsamuko/boost_thread_testcases

Imho the best solution woult be making the join_if_joinable and interrupt_and_join_if_joinable functors non-interruptable.

sehe commented 2 years ago

Nice work.

I feel your "best solution" is the easiest fix.

Sadly it is also a breaking change, which will only be feasible if we can either

Am I reading correctly that your 3rd bullet "adding the disable_interruption before any join to the thread_functors fixes all but the custom scope_guard" basically means that if people want to shoot themselves in the foot, they are free to do so? In that case, maybe this fix is actually best.

Perhaps we can add warning about join potentially throwing interruption exceptions at various points in the documentation.

elsamuko commented 2 years ago

I added non-interruptible versions of the joining thread functors (non_interruptable_join_if_joinable, non_interruptable_interrupt_and_join_if_joinable).
Then it's not a breaking change.

sehe commented 2 years ago

I'm confused. @elsamuko Throwing from a destructor is never ok. Therefore there is no possible use of throwing functors with thread_guard, meaning that the blanket fix in https://github.com/boostorg/thread/pull/369 is strictly better. What am I missing?

elsamuko commented 2 years ago

@sehe It's throwing only, when it's within a boost::thread, which should be a edge case.

369 is missing a fix for boost::scoped_thread. And I'm not sure, if it's guaranteed to be a non-breaking change.

sehe commented 2 years ago

@sehe It's throwing only, when it's within a boost::thread

Yes? The point is destructors can't throw

https://github.com/boostorg/thread/pull/369 is missing a fix for boost::scoped_thread.

I noted that https://github.com/boostorg/thread/pull/369#issuecomment-1122590610. That's just a different thing

And I'm not sure, if it's guaranteed to be a non-breaking change.

It's "breaking" in the sense that applications that were guaranteed to abnormally terminate, might in rare cases now hang because the destructor uses a join that doesn't get interrupted.

Not a reason to introduce half-baked fixes with ridiculous names that nobody will know to use when they needed to, IMO

pdimov commented 2 years ago

I agree that disabling interruptions in the destructors is the correct change. See also my comments in #369.

elsamuko commented 2 years ago

It's "breaking" in the sense that applications that were guaranteed to abnormally terminate, might in rare cases now hang because the destructor uses a join that doesn't get interrupted.

Do you have an example, where the interruptability of join() would lead to a deadlock if removed? I don't understand why I would want to interrupt a thread, which is joining another thread. This looks more like an unexpected side effect of using boost::condition_variable::wait() in join_noexcept().

Removing the interruptability of join() would fix crashes in custom scope guards and destructors, too.

pdimov commented 2 years ago

The general principle is that all blocking functions are interruptible. join is a blocking function, so it's interruptible.

All three of the examples in #369 show cases where an outer thread that owns and is joining an inner thread is interrupted. This is a typical use scenario; an outer thread spawns a group of scoped inner threads, passing them pieces of the task to be performed, then joins them at the end. The user of the outer thread interrupts the outer thread, and the outer thread is responsible for interrupting the inner worker threads.

pdimov commented 2 years ago

E.g. consider something like this

static void outer_thread_func()
{
    boost::scoped_thread<boost::interrupt_and_join_if_joinable> inner1( do_first_half );
    boost::scoped_thread<boost::interrupt_and_join_if_joinable> inner2( do_second_half );

    inner1.join();
    inner2.join();
}

This thread owns two worker subthreads, assigns them one half of the work to be done, and waits for them to finish. If the outer thread is interrupted, one of the joins throws, and the scoped_threads automatically interrupt+join the worker subthreads.

elsamuko commented 2 years ago

Thanks for the explanation, then a non-interruptible join() would break existing software.

pdimov commented 2 years ago

Should be fixed with https://github.com/boostorg/thread/commit/f71e0f1645413c3291e37d766e5b04bc57c6e132 and https://github.com/boostorg/thread/commit/8db325363b8e91de23c062ec8964bb605ad89f11.