Open ghost opened 3 years ago
@markand commented on Oct 31, 2018, 12:27 PM UTC:
Hi, I'm affected by this bug as well.
Building my application with BOOST_ASIO_DISABLE_EPOLL
works fine, otherwise, I get random crash exactly as cotti.
It happens on Arch Linux, boost 1.68, linux 4.18.
@oscarfv commented on Mar 25, 2019, 7:42 PM UTC:
On an internal project I can reproduce the problem always. Defining BOOST_ASIO_DISABLE_EPOLL
does not fix the crash here.
In my code, the calls to asio
are inside wrapper functions compiled in a shared library. If I move those calls to the main executable, there is no crash.
@joshedwards22 commented on Mar 2, 2020, 5:00 PM UTC:
Any update on this? I have encountered this when sharing an io_context with runtime loaded shared libraries on linux. It appears that the "thread_call_stack::contains(this)" is depending on a static variable that does not exist across the module boundary. This "compensating_work_started" seems to be called at random times and for what reason I cannot figure out.
@djarek commented on Mar 2, 2020, 9:41 PM UTC:
joshedwards22 Exposing ASIO classes across ABI boundaries is a bad idea - ASIO doesn't make any ABI stability guarantees AFAIK.
@lverbe commented on Mar 12, 2020, 10:15 PM UTC:
This patch fixes it for Boost 1.72.0.
--- ./boost/asio/detail/impl/scheduler.ipp.orig 2020-03-12 11:00:06.823085227 -0600
+++ ./boost/asio/detail/impl/scheduler.ipp 2020-03-12 11:01:30.898891690 -0600
@@ -317,8 +317,8 @@ void scheduler::restart()
void scheduler::compensating_work_started()
{
- thread_info_base* this_thread = thread_call_stack::contains(this);
- ++static_cast<thread_info*>(this_thread)->private_outstanding_work;
+ if (thread_info_base* this_thread = thread_call_stack::contains(this))
+ ++static_cast<thread_info*>(this_thread)->private_outstanding_work;
}
void scheduler::post_immediate_completion(
@dillaman commented on Nov 9, 2020, 1:35 PM UTC:
I am also encountering this issue when trying to use boost::asio
from within shared libraries. In the Ceph project, we are trying to incorporate boost::asio
in our client-side libraries librados
and librbd
, but that results in each shared library's bss section getting its own boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_
and therefore we randomly hit this crash.
@rpopescu commented on Dec 18, 2020, 1:47 PM UTC:
This patch fixes it for Boost 1.72.0.
--- ./boost/asio/detail/impl/scheduler.ipp.orig 2020-03-12 11:00:06.823085227 -0600 +++ ./boost/asio/detail/impl/scheduler.ipp 2020-03-12 11:01:30.898891690 -0600 @@ -317,8 +317,8 @@ void scheduler::restart() void scheduler::compensating_work_started() { - thread_info_base* this_thread = thread_call_stack::contains(this); - ++static_cast<thread_info*>(this_thread)->private_outstanding_work; + if (thread_info_base* this_thread = thread_call_stack::contains(this)) + ++static_cast<thread_info*>(this_thread)->private_outstanding_work; } void scheduler::post_immediate_completion(
I'm really surprised to see this patch not being applied; is there a reason for this?
@oscarfv commented on Dec 18, 2020, 2:30 PM UTC:
lverbe , rpopescu : maybe chriskohlhoff does not monitor this issue tracker.
@rpopescu commented on Dec 18, 2020, 4:33 PM UTC:
oscarfv do you know what he does monitor? the trac ticket is 3 years old it seems: https://svn.boost.org/trac10/ticket/13562 thanks.
@oscarfv commented on Dec 18, 2020, 5:09 PM UTC:
rpopescu : no idea. It seems that he keeps working on https://think-async.com/Asio/ and its Boost incarnation, but I see no way of contacting him on that webpage. Let's wait a bit, maybe he notices the mention on my prior message.
So, for us this happens when io_context
is passed across shared library boundary. The thing is, boost::asio seems to be a header-only, and thread_call_stack::contains()
seems to rely on a static/global variable. So, it seems like this is the case of codes in different shared libraries get different instances of a global variable?
Last week I ran into the same issue and I've found that it happens when there is a shared library involved and there is incoming data from a connected client without a active async_read.
The issue can be reproduced by running the attached asio_segfault code. asio_segfault.zip
The plugin code wraps the asio tcp async echo server example in a shared library. The echo_server.h code is modified to not call _doread after the data has been returned to the client The main code creates a io_context and passes this to by the plugin created server.
When this code is executed the server code waits for a connection and accepts the first line of text. The server echos the received text and, with the modifications done to the echo_server.h, does not wait for a new data. When the client sends a new line of text the application crashed with a segmentation fault. With BOOST_ASIO_DISABLE_EPOLL set the code keeps on running
Information on our setup:
@pvd : setting BOOST_ASIO_DISABLE_EPOLL
makes things worse here (without it, some test cases succeed, with it, all test cases fail.) Debian Bookworm with Clang 15 Boost 1.78.
I think @peat-psuwit pinpointed the problem: using static variables on a header-only library.
@oscarfv We have been running with EPOLL disabled for this week and not segfaults so far; I after we switch EPOLL off an bug in our code was found. The code was using a async_write with a temp. buffer that got deleted before the write was executed. This has now been fixed by replacing the async_write with a sync write.
I correct my previous claim about BOOST_ASIO_DISABLE_EPOLL
not fixing the crash. Indeed, the crash goes away. Thanks @pvd and sorry for the noise.
We also ran into this issue (using asio 1.20.0), any news on this? @chriskohlhoff
Facing similar issues, also on ARM. Anything we can provide to help debugging?
This still seems to be a problem on the latest version of Asio, even with DISABLE_EPOLL
. I'm experiencing a segmentation fault due to a null asio::call_stack<...>::top_
when using Asio across shared libraries. #780 appears to be very similar or the same issue.
@cotti commented on Sep 26, 2018, 6:03 PM UTC:
We have encountered multiple times what appears to be the same issue as described in https://svn.boost.org/trac10/ticket/13562 on a server application that makes heavy use of asio to send HTTP requests and read their responses.
Our usage is fairly standard: upon getting a connection, we call boost::asio::async_write() binding the next method, onWrite(), and inside it we call boost::asio::async_read_until(), binding onRead() which performs the next step of reading the response to our request, and so on. Our Async HTTP threads are quite short-lived: on our logs their whole life-cycle takes less than a second. We find the threads take a certain amount of time without activity before they are hit with a SIGSEGV - it can vary between a few seconds up to more than a minute, and it seems to happen between onWrite() and onRead().
We were using boost 1.66.0 up until a couple of weeks ago. After this issue occurred a few times, we tried an upgrade to boost 1.68.0, but if anything the frequency this issue is ocurring has increased to almost daily.
This issue was moved by chriskohlhoff from boostorg/asio#150.