eth-cscs / sarus

OCI-compatible engine to deploy Linux containers on HPC environments.
https://sarus.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
127 stars 10 forks source link

Stopping Sarus containers #34

Closed mcopik closed 8 months ago

mcopik commented 9 months ago

Hi!

We are using Sarus containers to host serverless functions on HPC machines. Thus, our case is different than a typical MPI job - the application running inside a container is not running to completion. Instead, it keeps running until the serverless resource manager kills it.

When using the Docker, we can stop containers easily by sending signals, either by using CLI or sending an HTTP request to the daemon. This is important as we want to send SIGINT or SIGTERM to allow a graceful shutdown. In Sarus, we have no such option. When we start a new container using fork + exec, the new process is running as a root user, and we are not allowed to send any signals to it. When using Sarus on Piz Daint, we noticed that this process spawns another root-owned process, and only this one actually starts a user-owned process executing our containerized application. Only then can we send signals and properly terminate our containers. We developed a method that queries the children of the main container process until it finds our application. However, this method does not seem to be very robust, and it depends on the internal implementation of Sarus.

Is there a more reliable method of stopping running containers that do not depend on Sarus internals?

Madeeks commented 8 months ago

Hi @mcopik, Sarus currently does not provide means to stop containers or control their execution after they are started, like docker stop to make a concrete example. We have discussed a few options among the developers, but unfortunately we don't see the chance to work on this feature in the foreseeable future.