gazebosim / gz-transport

Transport library for component communication based on publication/subscription and service calls.
https://gazebosim.org
Apache License 2.0
29 stars 43 forks source link

Callback method is executed after instance that holds the callback is destroyed #228

Open francocipollone opened 3 years ago

francocipollone commented 3 years ago

Environment

Description

Context: Typical publisher-subscriber scenario in which the callback method of the subscriber is an instance method of a class.

Steps to reproduce

I forked the repo and branched from branch name ign-transport7 (commit name: "⬆️ 7.5.1 🏁 (#206)")" and I've added some apps in the example folder. --> https://github.com/francocipollone/ign-transport/commit/9d5427fa56a7e420f6182d7fd3d435eaabd95818

  1. Git Clone https://github.com/francocipollone/ign-transport

  2. cd ign-transport
    git checkout francocipollone/adds_example_to_show_bug
  3. Build the project including the examples.

  4. Move to build folder.

  5. Run the publisher_bug_example.cc

    ./publisher_bug_example

    It is just a publisher without any sleep time to force a big amount of messages.

  6. Run the subscriber_bug_example.cc I provide :

    ./publisher_bug_example

    Basically, the idea is to force the undefined behavior when the object that holds the callback method is destroyed before this method is called again.

The example consists of an infinite loop that in every iteration creates the instance of a class that is meant to subscribe to a topic by providing a callback method, then wait for a certain time (during this time the callback method is being called), and then the iteration finishes so this object is deallocated(when destroying this object, implicitly the destructor of ignition::transport::node is also called so it should be unsubscribed to the topic and no callback method should be called). Then the loop continues. The code is commented to explain the behavior.

ign_trans_bug

Output

The last image of the animation shows: ign_trans_bug

Where it can be seen that at the end of the first iteration the object that holds the callback is destroyed and after that a callback is executed leading to a segmentation fault.

I've added another example where I do a workaround in order to keep the instance of the class alive in memory until the end of the execution by keeping a shared_ptr of itself. The code is also commented: subscriber_bug_workaround_example.cc It can be executed by executing the following in the build folder

./subscriber_bug_workaround_example

In this case, no segmentation fault is thrown, which makes sense given that all the instances of the class are still living in memory.

I am opened to provide more information if needed. Please don't hesitate in asking. :+1:

caguero commented 3 years ago

Thanks a lot for the detail report!

Just one clarification, are you sure this is happening because a new callback is executed or because we destroy the node while a callback is still executing? The output from your terminal suggests the former but I wonder if that's verified (sometimes the order of messages in the terminal can be confusing when there are multiple threads in parallel).

francocipollone commented 3 years ago

Thanks a lot for the detail report!

Just one clarification, are you sure this is happening because a new callback is executed or because we destroy the node while a callback is still executing? The output from your terminal suggests the former but I wonder if that's verified (sometimes the order of messages in the terminal can be confusing when there are multiple threads in parallel).

Thanks for your answer! Sorry I delayed the response.

Honestly, I am not completely sure, but by checking out the ign-transport code I would say it is because the node is destroyed while the callback is executed.

Correct me if I am wrong but from what I see, once Node::~Node() is called then Node::Unsubscribe()is called and then the SubscriptionHandler that holds the callback is detached/removed. Therefore, when a new message arrives, no handler is linked and then no callback should be triggered. If so I would definitely say that the undefined behavior takes place when the node is destroyed while the callback is still being executed. If this is the case, is it managed somehow in the code? I couldn't find anything related.

Thanks!