chriskohlhoff / asio

Asio C++ Library
http://think-async.com/Asio
4.81k stars 1.2k forks source link

ASAN stack use after return: io_uring timers not always cleared, leading to segfaults #1284

Closed bgemmill closed 11 months ago

bgemmill commented 1 year ago

When compiling with:

add_compile_definitions(BOOST_ASIO_HAS_IO_URING=1)
add_compile_definitions(BOOST_ASIO_DISABLE_EPOLL=1)

I see issues around ASAN inside the io_uring service, specifically where it sets the userdata pointer to a stack local variable here.

Looking farther down the code, that supposedly gets cleared here, but I keep running into cases with heavy use of timers where the removal doesn't trigger.

The segfault occurs on a later pass through this code, where the old &ts doesn't neatly match with anything, and then is incorrectly assumed to be an operation and called into here, segfaulting.

I've been testing a fix locally:

1) Set the initial userdata to nullptr here so that the timer just wakes the system up with a noop.

2) Do not increment local_ops here, since the wakeup will be a noop.

3) Remove the block of code that attempts to remove these times.

Happy to submit a PR if that sounds good.

phprus commented 1 year ago

@bgemmill Could you provide a code that reproduces this error?

bgemmill commented 1 year ago

@phprus My code uses a lot of timers from beast's ssl websockets, so while I can try to make a small test case, it may take some doing.

If I had to guess at what's going on, the timer is a too-late-to-cancel state, as implied by the documentation for io_uring_prep_timeout_remove where that function could result in EALREADY and not remove the timer.

bgemmill commented 1 year ago

@chriskohlhoff Looking at the changelog here, it says: Fixed the io_uring backend to ensure the internal timeout operations, used to implement io_context::run_for and io_context::run_until, are cleaned up correctly.

It looks like this issue has gone away since that change. What was happening around timers?

bgemmill commented 1 year ago

I had a segfault with uring and not with epoll; I'll try capturing with ASAN again.

bgemmill commented 11 months ago

It was something else, thanks for solving this one. I'll open a separate issue.