Closed elsamuko closed 2 years ago
@pdimov ping
I'd expect at join_noexcept
to handle the interruption exceptions instead of blocking interruption.
The problem was that exceptions emanated from a (conceptually) noexcept operation, and therefore from destructors that used it.
It seems that suppressing interruption might suppress that kind of exception, but I'd imagine it could result in deadlocks if a thread was modeled to never return unless interrupted.
@elsamuko Any chance you can check whether this alternative approach removes your problem?
How would these functions handle the interruption exception?
@sehe Wouldn't this be practically the same, if I disable the interruption locally or catch boost::thread_interrupted
?
@sehe But it works, too (based on this sample):
// boost::this_thread::disable_interruption di;
t.interrupt();
try { t.join(); } catch( boost::thread_interrupted& ) {}
It just does the same thing, less efficiently.
How would these functions handle the interruption exception?
@pdimov Again, not sure. Perhaps boost::thread_interrupted
can be handled by ignoring it. Mmm. When I feared potential deadlock because the inner thread would never join, I forgot that disable_interruption
applies to the joining thread, not the thread being joined, obviously.
However, boost::thread::join
is specified as an interruption point. Disabling interruption in join_noexcept
seems to break that?
The more I think about this, the more I wonder whether join_noexcept
was badly named (specifically meaning it won't check preconditions), and should raise thread_interruption if only to satisfy the interruption_point criteria. (Notice how the overload isn't marked noexcept
either, independent of language support, while other functions are).
And by consequence, the best fix would be to handle (ignore) said exception when in a destructor - such as the thread_guard
destructor.
Well, a noexcept function can't be an interruption point, obviously. And I can't look up what the docs say because boost.org is broken. :-)
Hah. My DNS/cache wins. The function isn't noexcept.
join()
is an interruption point and its implementation unconditionally calls join_noexcept()
. Because the only extra checks in join()
perform precondition checks, it follows that join_noexcept()
must be an interruption point.
Hence my conclusion that (A) join_noexcept
is unfortunately named (B) we have to plug the hole where it exists: join()
from a destructor.
join
is definitely an interruption point; the question was whether join_noexcept
was intended to be one.
I tried to answer that. Were you able to verify my reasoning?
Yes I did. thread::join
is here and it throws on deadlock (a thread joining itself) and on precondition violations (thread not joinable). It calls join_noexcept
, which necessarily implies that join_noexcept
is an interruption point. This makes this PR invalid, because it removes the interruptibility of join
.
The fix for the issue must be found elsewhere.
The fix for the issue must be found elsewhere.
I'll draw up a fix adding exception handling to the thread_guard destructor. I can't think of better ways. Of course, I can only make it squelch interruption exceptions, because otherwise we risk leaving programs in invalid states for unknown reasons.
Another possiblity would be adding the non_interruptable_interrupt_and_join_if_joinable
to the list of thread functors.
If someones is not using a thread_guard
, then it's still crashing.
And join_noexcept
would still throw.
Destructors need disable_interruption
. Which of our destructors has the problem? ~thread_guard
?
Adding it to the thread functors makes little sense to me, as they're useful outside of destructors.
I PR-ed the alternative approach here: https://github.com/boostorg/thread/pull/369
I summarized all testcases so far here: https://github.com/elsamuko/boost_thread_testcases
thread_guard
alone will not solve all cases, scope_guard
and scoped_thread
will still crashdisable_interruption
in join_noexcept
fixes all cases, but makes join()
non-interruptabledisable_interruption
before any join to the thread_functors
fixes all but the custom scope_guard
non_interruptable_interrupt_and_join_if_joinable
custom functorImho the best solution woult be making the join_if_joinable
and interrupt_and_join_if_joinable
functors non-interruptable.
Nice work.
I feel your "best solution" is the easiest fix.
Sadly it is also a breaking change, which will only be feasible if we can either
Am I reading correctly that your 3rd bullet "adding the disable_interruption before any join to the thread_functors fixes all but the custom scope_guard" basically means that if people want to shoot themselves in the foot, they are free to do so? In that case, maybe this fix is actually best.
Perhaps we can add warning about join
potentially throwing interruption exceptions at various points in the documentation.
I added non-interruptible versions of the joining thread functors (non_interruptable_join_if_joinable
, non_interruptable_interrupt_and_join_if_joinable
).
Then it's not a breaking change.
I'm confused. @elsamuko Throwing from a destructor is never ok. Therefore there is no possible use of throwing functors with thread_guard, meaning that the blanket fix in https://github.com/boostorg/thread/pull/369 is strictly better. What am I missing?
@sehe It's throwing only, when it's within a boost::thread
, which should be a edge case.
boost::scoped_thread
. And I'm not sure, if it's guaranteed to be a non-breaking change.@sehe It's throwing only, when it's within a boost::thread
Yes? The point is destructors can't throw
https://github.com/boostorg/thread/pull/369 is missing a fix for boost::scoped_thread.
I noted that https://github.com/boostorg/thread/pull/369#issuecomment-1122590610. That's just a different thing
And I'm not sure, if it's guaranteed to be a non-breaking change.
It's "breaking" in the sense that applications that were guaranteed to abnormally terminate, might in rare cases now hang because the destructor uses a join that doesn't get interrupted.
Not a reason to introduce half-baked fixes with ridiculous names that nobody will know to use when they needed to, IMO
I agree that disabling interruptions in the destructors is the correct change. See also my comments in #369.
It's "breaking" in the sense that applications that were guaranteed to abnormally terminate, might in rare cases now hang because the destructor uses a join that doesn't get interrupted.
Do you have an example, where the interruptability of join()
would lead to a deadlock if removed? I don't understand why I would want to interrupt a thread, which is joining another thread. This looks more like an unexpected side effect of using boost::condition_variable::wait()
in join_noexcept()
.
Removing the interruptability of join()
would fix crashes in custom scope guards and destructors, too.
The general principle is that all blocking functions are interruptible. join
is a blocking function, so it's interruptible.
All three of the examples in #369 show cases where an outer thread that owns and is joining an inner thread is interrupted. This is a typical use scenario; an outer thread spawns a group of scoped inner threads, passing them pieces of the task to be performed, then joins them at the end. The user of the outer thread interrupts the outer thread, and the outer thread is responsible for interrupting the inner worker threads.
E.g. consider something like this
static void outer_thread_func()
{
boost::scoped_thread<boost::interrupt_and_join_if_joinable> inner1( do_first_half );
boost::scoped_thread<boost::interrupt_and_join_if_joinable> inner2( do_second_half );
inner1.join();
inner2.join();
}
This thread owns two worker subthreads, assigns them one half of the work to be done, and waits for them to finish. If the outer thread is interrupted, one of the join
s throws, and the scoped_thread
s automatically interrupt+join the worker subthreads.
Thanks for the explanation, then a non-interruptible join()
would break existing software.
When running a boost::thread within a boost::thread, the inner thread can crash when calling join, when it's joined itself. See: https://github.com/boostorg/thread/issues/366 https://stackoverflow.com/q/71482828