Closed bgemmill closed 11 months ago
@bgemmill Could you provide a code that reproduces this error?
@phprus My code uses a lot of timers from beast's ssl websockets, so while I can try to make a small test case, it may take some doing.
If I had to guess at what's going on, the timer is a too-late-to-cancel state, as implied by the documentation for io_uring_prep_timeout_remove
where that function could result in EALREADY
and not remove the timer.
@chriskohlhoff
Looking at the changelog here, it says:
Fixed the io_uring backend to ensure the internal timeout operations, used to implement io_context::run_for and io_context::run_until, are cleaned up correctly.
It looks like this issue has gone away since that change. What was happening around timers?
I had a segfault with uring and not with epoll; I'll try capturing with ASAN again.
It was something else, thanks for solving this one. I'll open a separate issue.
When compiling with:
I see issues around ASAN inside the io_uring service, specifically where it sets the userdata pointer to a stack local variable here.
Looking farther down the code, that supposedly gets cleared here, but I keep running into cases with heavy use of timers where the removal doesn't trigger.
The segfault occurs on a later pass through this code, where the old
&ts
doesn't neatly match with anything, and then is incorrectly assumed to be an operation and called into here, segfaulting.I've been testing a fix locally:
1) Set the initial userdata to
nullptr
here so that the timer just wakes the system up with a noop.2) Do not increment
local_ops
here, since the wakeup will be a noop.3) Remove the block of code that attempts to remove these times.
Happy to submit a PR if that sounds good.