eclipse-iceoryx / iceoryx

Eclipse iceoryx™ - true zero-copy inter-process-communication
https://iceoryx.io
Apache License 2.0
1.65k stars 384 forks source link

option Docker:could not be killed with SIGTERM #2352

Closed yxx-jojoli closed 6 days ago

yxx-jojoli commented 1 week ago

Required information

Operating system: CentOS Linux release 7.9.2009 (Core)

Compiler version: GCC 10.2.1

Eclipse iceoryx version: v2.0.6

Observed result or behaviour: I have set up a client-server option with WaitSet mode using Docker to launch roudi, client, and server. When shutting down roudi, the client and server display 'could not be killed with SIGTERM.' I noticed that the process ID roudi tries to kill is within the Docker container, but this process ID cannot be found in the roudi container. Furthermore, when I place roudi in the same container as the server, the server can be terminated normally, but the client still displays 'could not be killed with SIGTERM.' Thank you for your assistance and any insights you might have.

Expected result or behaviour: I hope to be able to correctly kill the client and server even across Docker containers.

Conditions where it occurred / Performed steps:

  1. Start the roudi container, client container, and server container; then use docker stop roudi.
  2. Start the server container service, launch roudi in the container where the server is located, and start the client container. then close the roudi service with Ctrl+C.

Additional helpful information

image

image

I would be extremely grateful for any assistance you could provide.

elfenpiff commented 1 week ago

@yxx-jojoli Currently, this is not possible since they are in different process id spaces and roudi has no knowledge about the other process space or that the processes, that are registered at roudi, are in a different docker container.

I don't know if it is in general, possible to kill a process in docker container from within the host or another docker container. If killing processes across docker containers is possible, then this could be added as an additional feature to roudi. If not, we could create a killer process on every docker container, roudi could send a signal to this process and then it would kill all registered processes and then itself. But this would also require some effort.

What is your actual use case? Do you want to shut down RouDi and that all registered processes are terminated with it?

yxx-jojoli commented 1 week ago

@elfenpiff Yes, I hope that when RouDi stops, all registered processes can be terminated. I have noticed that when RouDi crashes, the processes encounter the error "Transport endpoint is not connected," and even after RouDi restarts, they do not recover. In other issues (such as issues/2179), I learned that RouDi sends SIGKILL to terminate processes and clear socket resources. However, due to using Docker mode, this functionality cannot be achieved across Docker containers. Additionally, I observed that during keepalive checks in cross-Docker deployments, there are warnings such as [Warning]: Received Keepalive from unknown process, which may also be caused by the cross-Docker container issue. image

If addressing this issue requires significant work, could you please suggest some temporary solutions? For example, could we use sendRequestToRouDi to periodically check RouDi, and if the check fails, terminate the process ourselves? However, there is limited documentation on the sendRequestToRouDi interface, which might make implementation challenging.

elfenpiff commented 6 days ago

@yxx-jojoli When roudi starts it creates lock files in /tmp/iox_*_roudi.lock so that itself can determine if there is already a roudi running or not. I think that roudi acquires a file lock and when roudi dies the file lock would be released. With that (when I am not mistaken with the file lock) any other application can detect that roudi is running or if it crashed.

So your application would just need to check this lock file, if it is still locked and if not it could kill itself - as an intermediate solution.

yxx-jojoli commented 6 days ago

@elfenpiff Thank you very much for your assistance. I determine whether roudi has crashed or restarted by checking the existence and creation time of the /tmp/roudi.lock file, and based on this information, I decide whether to close the current application. I am looking forward to the feature of stopping processes across Docker containers being added as a new feature in roudi.Thank you once again for your support.

elBoberido commented 6 days ago

@yxx-jojoli FYI, we are currently working on iceoryx2, which does not need a central daemon. This might be a solution for you in he future

yxx-jojoli commented 6 days ago

@elBoberido Thank you for the information! We appreciate that iceoryx2 does not require a central daemon, which could be a potential solution for our future needs. We will continue to follow the progress of iceoryx2. Thank you for sharing!