descriptor_data not guarded correctly

chriskohlhoff / asio

Asio C++ Library

http://think-async.com/Asio

4.86k stars 1.21k forks source link

descriptor_data not guarded correctly #686

Open ghost opened 3 years ago

ghost commented 3 years ago

@Sagar19shah commented on Apr 30, 2019, 6:48 AM UTC:

Hi - We are using boost 1.51.0 asio library on linux and seeing crash at times. Upon investigation we noticed that it is crashing when it tries to access a pointer which is already set to 0.

function where it crashes - asio::detail::epoll_reactor::start_op It crashes while accessing if (descriptordata->shutdown) line. gdb shows descriptor_data is already set to 0.

While debugging i noticed that epoll_reactor::deregister_descriptor might be setting the object to zero.

Scenario - We have seen this issue happening randomly while closing the socket.

Question - the mutex variable of descriptor_data object is used to guard access of this object. However this variable is marked as 0 in epoll_reactor::deregisterdescriptor function, while the thread is holding the mutex. In a different thread executing start_op function and waiting on this function will find null object when the mutex lock is freed.

Shouldn't there be a check in asio::detail::epoll_reactor::start_op function, after the mutex lock is grabbed to see descriptor_data is still valid object and then access it's shutdown variable.?

Also, isn't it a bad practice of marking the object as 0 but still holding the lock via it's member variable.?

This issue was moved by chriskohlhoff from boostorg/asio#229.

ghost commented 3 years ago

@djarek commented on Apr 30, 2019, 12:53 PM UTC:

descriptor_data is a pointer stored in each I/O object (e.g. socket). The documentation specifies that shared I/O objects are not thread-safe. The bug is in your code, most likely, due to a data race on a socket or other I/O object. Here's the reference to the documentation on thread-safety of sockets: https://www.boost.org/doc/libs/1_70_0/doc/html/boost_asio/reference/ip__tcp/socket.html

Thread Safety

Distinct objects: Safe.

Shared objects: Unsafe.

Side note: Boost 1.51 is ancient, you should consider an upgrade.

ghost commented 3 years ago

@vvaltchev commented on Aug 25, 2019, 11:10 PM UTC:

I filed this feature request: https://svn.boost.org/trac10/ticket/10009 about 5 years ago, hoping at least to get back an authoritative answer and to engage in a constructive discussion, while I got simply nothing back.

I knew that shared object access is not safe, but I tried to explain why this design is flawed, at least for some uses cases, and that the library should provide at least a way to make the shared access to objects be thread-safe (like in the patch I sent) using a compile-time flag (a #define).

In conclusion, I'm not sad because I did not get my patch merged or something like that: it's fine to stick with some, even bad, ideas when you own a project, because you believe you're right. I'm sad because my pretty reasonable argumentation did not even deserve an answer, for 5 years. And the saddest thing here is that I'm pretty sure that a lot of people using this library, are using it unsafely (because it has a misleading design and a not-so-good-documentation) and that most of the time they're unable to realize that: the race condition is so subtle that, depending on the use cases, might requires a certain "scale" in order to find the bug. It's a totally non-obvious and nasty problem to find. Nothing like what happens when an operation is 100% unsafe [obvious, often-reproducible crash]. Here, most of the operations are unfortunately 100% safe, while some operations like close() are safe something like 99% of the time and that's terrible, because it reduces by a lot the likelihood of a memory corruption, leading to a crash.

Note: all I know is about Asio in Boost 1.55, which is old, of course. But I suspect that most of the stuff still applies to Boost 1.71.

bevinhex commented 3 years ago

I have to run test program 20k times just to be able to find out where it crashed, I would consider this as a bug, just like bug, it does not happen often, but it happens, took me a long time just to discover program crashed at

if (descriptor_data->shutdown_)                                                                                                                                                                                                {                                                                                                                                                                                                                             
 post_immediate_completion(op, is_continuation);                                                                                                                                                                             
 return;                                                                                                                                                                                                                     
}

with SIGSEGV

vvaltchev commented 3 years ago

@bevinhex Exactly my point. You're not using Boost::Asio correctly. The socket objects are not thread-safe and that's, unfortunately, by design. I dislike that design, but it's not a bug in the implementation of the library: the bug is in your code. Read carefully the documentation and use strands to wrap all the callbacks. Good luck if you want to interact with multiple sockets from any given callback: you'll have either to use the same strand for all of them, or take care of the synchronization yourself using a dedicated lock per socket.

mabrarov commented 3 years ago

IMHO thread safety is never cost free (mutex?), so to be compatible with high performance needs (i.e. to keep being agile) I completely agree with design of Asio in terms of thread safety. It's asynchronous code & design - it doesn't always need a thread safe implementation but relies mostly on queues for synchronization.

Regarding providing a thread safe (for shared objects) implementation additionally to non-thread safe one - can't this be implemented by wrapping sockets with mutex? If yes, then do we need this wrapping in Asio (increasing its size and complexity)? I doubt.

vvaltchev commented 3 years ago

Thread safely is not free, I totally agree with that.

But, think about a large-scale setup. You'll have thousands sockets and a thread-pool with MANY threads. By wrapping callbacks with strands, you're serializing those callbacks, even if they won't run on the same thread. That's OK because they all belong to the same socket and it's safe. Each async_read callback can perform a new async_read call or close the socket and that's fine.

Now, what if I want to close socket B from the callback of socket A? Or, what if I have a control stream socket A and a 100 more sockets that I want to close when I receive a given command on socket A? Well, according to Boost::Asio's design, you'll have to wrap ALL the callbacks for ALL the sockets with the SAME strand. But that will totally destroy your parallelism! That's the problem. An alternative is post a callback that would perform close (or any other operation) in the strand OF EACH socket I need to work with. That would be pretty expensive. It would mean making each async call (e.g. async_read) not belonging to my strand's socket to be performed asynchronously. That's a double level of asynchronism: not only the read is asynchronous (which is what we want), but the call to async_read() itself has to be asynchronous because we have to enqueue it in other socket's strand. That will reduce the throughput and increase the latency of any service doing that.

So, a better way to overcome to this limitation is to carry a custom mutex and wrap the socket ops with it, but that's very inconvenient. I believe it would be much better to support a compile-time define to enable thread safety. That's it. It's fine to keep it disabled by default to avoid the performance penalty for the users who don't need it, but the ones who need it, can just enable it.

By the way, for more about that, please read the bug I've filed back in 2014: https://svn.boost.org/trac10/ticket/10009

mabrarov commented 3 years ago

Now, what if I want to close socket B from the callback of socket A? Or, what if I have a control stream socket A and a 100 more sockets that I want to close when I receive a given command on socket A? Well, according to Boost::Asio's design, you'll have to wrap ALL the callbacks for ALL the sockets with the SAME strand.

That would be misuse of idea of asio::strand. You'll need to use asio::strand per socket.

An alternative is post a callback that would perform close (or any other operation) in the strand OF EACH socket I need to work with.

This is the way to go, yes. Note that mutex is a queue internally. It just has synchronous API. Usage of mutex per socket is not lighter than usage of asio::strand (especially when using Asio with custom allocator) if you reach some pretty big number of sockets (note that asio::strand uses shared pool of mutexes internally... while it could use lock-free queues if some guarantees provided by asio::strand are relaxed).

Yes, your design needs to be asynchronous if you want to be performant and use sockets asynchronously. It means that you won't have void my_connection::close() method anymore - it will be CompletionToken my_connection::close(CompletionHandler handler) now.

That's a double level of asynchronism

Nobody says writing asynchronous code is easy and it's hard even in such "safe" languages like Java. If asynchronous design doesn't fit your needs (is too complex for particular task) then why not use blocking / synchronous socket operations or futures and thread-per-connection (2 threads and a mutex per connection if want to receive and send in parallel) approach (which works in a bigger number of cases than one could think about it)?

That will reduce the throughput and increase the latency of any service doing that.

I'm not ensured that user-space queues are heavier than mutex even if every call of public method of my_connection class is performed through an instance of asio::strand owned by that connection instance. It depends on numbers. There is asio::dispatch function which may help to reduce queuing (but it requires to be careful to avoid infinite recursion). Make a note on Asio custom allocator support if have concern about dynamic memory allocations. The rest part of the queuing should be lightweight, shouldn't it?

It's fine to keep it disabled by default to avoid the performance penalty for the users who don't need it, but the ones how need it, can just enable it.

Fair enough. Need to check your patch and think about possibility of deadlock, like if there is a need to add additional requirements for completion handlers, when this kind of internal synchronization is used.

Thank you!

vvaltchev commented 3 years ago

I'm not ensured that user-space queues are heavier than mutex even if every call of public method of my_connection class is performed through an instance of asio::strand owned by that connection instance. It depends on numbers.

I believe that in many cases using a per-socket mutex is faster than rescheduling the calls on the other strand because if no other thread currently holds the mutex, the current thread won't go to sleep and there won't be any context switch. As far as I know, under Linux when there's no contention, no syscall is made. If I enqueue on the individual strands instead, I'll have to pay in any case the enqueue + dequeue cost for all the 100 ops, plus potentially more context switches, depending on the number of dispatcher threads. If I have 1,000 dispatcher threads, it's statistically likely that each of the 100 strands will be dispatched by a different thread: that means 100 context switches, even if there's no contention. But, you're right that it depends on the numbers: in a different scenario the strand approach might be faster.

Fair enough. Need to check your patch and think about possibility of deadlock, like if there is a need to add additional requirements for completion handlers, when this kind of internal synchronization is used.

Thank you!

Thank you so much for that! Yours is the first answer I got about the topic in 7 years :-) While I tested it at the time (on a large-scale), the patch is old now: its general idea should still hold, unless Asio's implementation changed significantly since 2014, but it definitively requires a review from some Asio expert. Thanks again for that.

mabrarov commented 3 years ago

I believe that in many cases using a per-socket mutex is faster than rescheduling the calls on the other strand because if no other thread currently holds the mutex, the current thread won't go to sleep and there won't be any context switch

That's a separate topic (ping me in email or Skype if you'd like to discuss it). In short, it does not always work this way. Usage of queues (asio::strand) theoretically increases number of cases when a thread context switch can happen. In reality it depends on the numbers, on the code (refer below) and on the test case.

For Linux you are right, but for Linux it's better to use a single thread per asio::io_service instance with a pool of asio::io_service instances (where the number of asio::io_service instances is ~ number of CPUs). This way you can reduce thread context switches, though it doesn't help with cross-io_service communication, because 2 sockets associated with 2 different asio::io_service instances will still have a thread context switch if one socket accesses another socket through asio::strand associated with different asio::io_service instance. It's not an issue of Asio here. It's because of the Linux itself (because it doesn't have the same as Windows IOCP provides). A generic thread pool on Linux will have the same issue as Asio has, if no special optimization (to implement LIFO) is made (I'm not sure if Asio lacks this kind of optimization).

For Windows, when using IOCP, the Windows scheduler associates each thread (calling IOCP API) with an IOCP instance and uses LIFO scheduling for the treads associated with the same IOCP instance. So the chance of an excessive thread context switch is smaller, when using a single asio::io_service instance (which holds a single IOCP descriptor internally) with a thread pool (number of threads ~ number of CPUs) on Windows.

Again, asio::dispatch function (asio::strand::dispatch for previous versions of Asio) can help sometimes too.

If I have 1,000 dispatcher threads, it's statistically likely that each of the 100 strands will be dispatched by a different thread: that means 100 context switches, even if there's no contention.

I'm not sure if there is a need to use asynchronous (event-like) approach with 1,000 threads - what about this:

why not use blocking / synchronous socket operations or futures and thread-per-connection (2 threads and a mutex per connection if want to receive and send in parallel) approach (which works in a bigger number of cases than one could think about it)?

As far as I know, usually asynchronous design is used with a pool of threads (it can be multiple pools with tasks distributed b/w pools based on the type / nature of the task), where the number of threads is kept near the number of (logical) CPUs (like number_of_cpus + 1) to avoid overselling CPU.

vvaltchev commented 3 years ago

I agree that's a separate topic, no need to "spam" this Github issue for in-deep technical discussions. Also, even if I wanted, I cannot discuss further the use case I'm thinking about in detail in this context, because of NDAs. I still appreciated so much at least having started this conversation.

I'll ask just one more question:

It's because of the Linux itself (because it doesn't have the same as Windows IOCP provides)

Could things improve by using the AIO interface or the newer io_uring instead of the epoll reactor?

mabrarov commented 3 years ago

BTW, IMHO it's unclear from documentation if it's safe to call asio::ip::tcp::socket::close method if another thread(s) performs synchronous send or / and receive operation (and if underlying operating system calls are also thread safe) using the same object:

Synchronous send, receive, and connect operations are thread safe with respect to each other, if the underlying operating system calls are also thread safe. This means that it is permitted to perform concurrent calls to these synchronous operations on a single socket object. Other synchronous operations, such as open or close, are not thread safe.

I guess we could enhance documentation (in the part of thread safety) as part of this issue too.

bevinhex commented 3 years ago

I have the same feeling, initially I saw io_context is thread safe, since all sockets are created using io_context, gave me the impression sockets are thread safe as well