Closed hrudhansh closed 6 days ago
This was posted originally in this issue #2193
@hrudhansh do you have a minimal example which triggers the problem? Ideally targeting the iceoryx main branch.
If you look at our examples, they also register a signal handler and have no problem with ctrl+c. They use the signal handler either implicit via iox::waitForTerminationRequest();
and while (!iox::hasTerminationRequested())
or explicit with iox::registerSignalHandler
.
@elBoberido You are correct! Adding "while (!iox::hasTerminationRequested())" seems to make the issue go away.
So the issue was essentially:
But this is great, I will potentially just add it in-front of every publish call if the overhead isn't too high. Works every time now, thank you!
@hrudhansh you don't need to add it before every publish call. I guess you will have a loop where you publish or something similar. Just add it as part of the loop condition. Alternatively if you are blocking in the main thread, it might also be sufficient to just have the iox::waitForTerminationRequest();
call there.
If you are able to post a minimal example of your code, I might be able to tell you the ideal solution for iceoryx. The important thing is to handle the shutdown in a way to let all the destructors run.
So I am essentially making an opinionated wrapper library around Iceoryx for exactly our use-case. One of the "philosophies" of this library is having a very small footprint in our codebase. So ideally the flow is - bring in the header > instantiate > call publish... everything else is taken care of for you. So while I don't have a fixed minimal example, in this case I was just trying to push the boundaries by calling publish with no delays, and see how it holds up. It holds well btw! I did not miss a single message on the sub side once the Options are set correctly.
But I see your point - better to optimize around the whole publish loop instead of every publish call.
This example might be interesting for you https://github.com/eclipse-iceoryx/iceoryx/blob/main/iceoryx_examples/request_response/client_cxx_waitset.cpp
It shows that you basically just have to register a signal handler and then notify your event loops to stop the execution.
Required information
Operating system: Ubuntu 24.04 LTS
Compiler version: 12.3.0
Eclipse iceoryx version: b2cd72bdc789bcf7601cb112c6078c47d533d798
Observed result or behaviour: Killing an application that in the middle of a 'critical section' of publish causes POPO__CHUNK_LOCKING_ERROR in iox-roudi
Expected result or behaviour: Upon calling the de-constructor, it is able to abruptly stop publish, exit the 'critical section', and exit gracefully.
Conditions where it occurred / Performed steps: To reproduce -
Additional helpful information
On my end, I ran gdb on the pub process with -exec handle SIGINT nostop & -exec handle SIGINT pass, a breakpoint on the exit(sigint); and called pkill -SIGINT publisher in a separate terminal. I noticed:
So I assume what is happening is -
Publish thread starts critical section > triggers an 'is_started' state change in background thread > sends ack back to publish > publish moves ahead > publish is interrupted > background thread is waiting for an 'is_ended' trigger > it never gets it so keeps waiting > publish thread also waiting for background thread to ack 'is_ended'
Also: