CopernicaMarketingSoftware / AMQP-CPP

C++ library for asynchronous non-blocking communication with RabbitMQ
Apache License 2.0
884 stars 340 forks source link

Segv when dtor called on too big channelImpl deferred list #492

Closed Dominique57 closed 1 year ago

Dominique57 commented 1 year ago

Hello, This is one of my first public issues, so I'm sorry if it's not correctly formatted or presented.

Describe the bug In a multi-threaded producer-consumer case (all amqp-cpp code is executed in consummer), when enqueuing lots of tasks thus lots of deferred objects are registered, if the channelImpl destructor is called before they are handled and free'd, there can be a stack overflow.

Expected behavior and actual behavior The destruction of the deferred list should not throw a stack-overflow / segv error.

Sample code I am using uvw (c++ wrapper around libuv) and have the following setup :

When the module enqueues ~1000000 publishes the communicator tries to handle it but mainly dequeues a lot of tasks before handling a socket event. Then the module does some work and sends a stop loop task to the communicator. The module waits for the communicator to dequeue, stop the loop and the thread. Once the communicator thread is stopped the module exits. The crash occurs at this point: since the defered list is destroyed, the whole deferred list is recursively destroyed. And since the list is very very big in this particular case, a stack overflow happens.

To validate this theory I added the following code in include/amqp/Deferred.h's destructor to destroy the list iteratively by increasing the reference count of the smart pointer next to the next of the current item and disasociating the next element from the list to avoid the previous recursion issue :

    virtual ~Deferred()
    {
        // report to the finalize callback
        if (_finalizeCallback) _finalizeCallback();
# code aded begins here
        while (_next) {
            if (_next.use_count() == 1) {
                auto surNext = _next->_next;
                _next->_next = nullptr;
                _next = surNext;
            }
        }
# code aded ends here
    }

Here is my project link to the specific commit that generated the issue (for reproductability) : https://github.com/Dominique57/amqp-libuv/tree/228fc7be4548ff38ad82a3643e3f932d7b33b9fb

Since it's a very specific use-case I don't know if it warrants the presented fix (especially using "use_count").

Also i wanted to give props to the maintainers and contributors of the library, it's a very cool project that is very fun to work with !

EmielBruijntjes commented 1 year ago

The objects are not thread safe. You cannot run your event loop in one thread, and at the same time call methods on a connection or channel object from a different thread. You should do everything in the same thread.