AirenSoft / OvenMediaEngine

OvenMediaEngine (OME) is a Sub-Second Latency Live Streaming Server with Large-Scale and High-Definition. #WebRTC #LLHLS
https://airensoft.com/ome.html
GNU Affero General Public License v3.0
2.61k stars 1.06k forks source link

OME docker container doesn't respond to SIGTERM #1531

Closed d-uzlov closed 3 weeks ago

d-uzlov commented 9 months ago

Describe the bug

SIGTERM is the standard way for a graceful program termination. The official OME container doesn't seem to respond to it.

It does seem to correctly shut down on SIGABRT (well, at least there is a log entry, so I guess it must be graceful shutdown and not a default handler) but docker and k8s both use SIGTERM as the default shutdown signal.

To Reproduce

$ docker run --name ome -d docker.io/airensoft/ovenmediaengine:0.16.4
2c39a0d3f324ca1f124c5ef4997b96fd21daa821d58179f8b43f773abb1aa733
danil@danil-main:/mnt/c/Users/danil/Documents/k8s-public-copy
$ docker kill --signal SIGTERM ome
ome
$ docker ps
CONTAINER ID   IMAGE                              COMMAND                  CREATED          STATUS          PORTS
                              NAMES
2c39a0d3f324   airensoft/ovenmediaengine:0.16.4   "/opt/ovenmediaengin…"   59 seconds ago   Up 59 seconds   80/tcp, 1935/tcp, 3333-3334/tcp, 8080/tcp, 8090/tcp, 4000-4005/udp, 9000/tcp, 10000-10010/udp   ome
$ time docker stop ome
ome

real    0m10.306s
user    0m0.017s
sys     0m0.028s

As you can see, docker kill --signal SIGTERM ome did nothing, and docker stop ome hanged until 10s timeout.

Expected behavior

Container performs graceful shutdown on SIGTERM.

Server:

Additional context

I don't have a server with non-docker OME available, so I can't check if this is a docker-specific issue. If it is, then my guess is that it may be related to linux disabling default signal handlers for PID 1.

Also, as a sanity check, here is a similar test with nginx:

$ docker run --name nginx -d nginx
4e4856903f1ad2cb964a0c55c2f1c3d8b3b5f38381a084488ade5512befcd2c6
danil@danil-main:/mnt/c/Users/danil/Documents/k8s-public-copy
$ docker kill --signal SIGTERM nginx
nginx
danil@danil-main:/mnt/c/Users/danil/Documents/k8s-public-copy
$ docker ps --all
CONTAINER ID   IMAGE     COMMAND                  CREATED          STATUS                     PORTS     NAMES
9cef57b37d5f   nginx     "/docker-entrypoint.…"   12 seconds ago   Exited (0) 6 seconds ago             nginx
$ docker rm nginx
nginx
$ docker run --name nginx -d nginx
c11b540f96b1bcaeca21414b75bb4b30fc1af011e3d713ff76642ec37f940543
danil@danil-main:/mnt/c/Users/danil/Documents/k8s-public-copy
$ time docker stop nginx
nginx

real    0m0.481s
user    0m0.020s
sys     0m0.020s
basisbit commented 9 months ago

For those who use docker-compose, put this into your docker-compose.yml as a workaround: stop_signal: KILL (for an example, see https://github.com/basisbit/KitchenCDN/blob/main/docker-compose.yml )

naanlizard commented 9 months ago

Is this why a docker-compose down takes so long with OME? I figured it was cleaning up, but maybe it's timing out and is forcibly killed?

basisbit commented 9 months ago

@naanlizard yes, it is. By default Docker will wait 10 seconds and then kill the service.

dimiden commented 9 months ago

This was something we overlooked. I've updated the Dockerfile to send SIGKILL instead of SIGTERM. The Docker image built next will stop immediately. 👍

https://github.com/AirenSoft/OvenMediaEngine/commit/f53d4bf9925dffc3c7a186736d09023770d10a2d

d-uzlov commented 9 months ago

This was something we overlooked. I've updated the Dockerfile to send SIGKILL instead of SIGTERM. The Docker image built next will stop immediately. 👍

SIGKILL doesn't allow for any graceful shutdown, though. I imagine it can be very bad for stream recording, for example.

Does OME support graceful shutdown? If yes, then I think a better solution would be to properly support SIGTERM.

naanlizard commented 9 months ago

I am also rather concerned about recordings and would like to know more

dimiden commented 9 months ago

@d-uzlov Yes, you're correct. As you mentioned, it's preferable to support SIGTERM over SIGKILL. However, I'm currently unable to support SIGTERM immediately, so I've opted to handle it with SIGKILL for now.

To elaborate further, during the initial development of the OME, I had implemented graceful shutdown upon receiving the SIGINT for testing purposes. However, due to some issues, I'm currently unable to support SIGTERM. As I don't have the time to address the issues related to graceful shutdown and add support for SIGTERM immediately, I've decided to handle it with SIGKILL for the time being.

If I have more time to focus on OME, I'll support SIGTERM :)

stale[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

naanlizard commented 5 months ago

Sorry to hop in a random thread here, but will there be a release soon? We are waiting to deploy that crash fix and it sounds like there are a few other fixes (like this one) that would be nice

stale[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

naanlizard commented 2 months ago

I'm looking to automate releases in my production environment. Part of that is making sure that when we kill the container, recordings are safely processed.

When I deploy manually, I usually run tcpkill to stop incoming streams and then wait 30s or so, which is entirely based on vibes and not because I've tested it.

I could use tcpkill in my automatic deployments, but that is complex. Is there an API call that I can make, to start shutdown for the server properly? Or perhaps remove every stream and prevent reconnection until the recordings are processed?

The container will then be removed and replaced, so as long as the new one can come up properly, we can be a little destructive here. Maybe delete the app or something?

dimiden commented 2 months ago

@naanlizard For now, the only option is to either stop the stream or delete the stream via the API and then terminate OME. However, in the case of the second option, if a new stream is published right after the deletion, unexpected side effects may occur, so the first option is preferable.

dimiden commented 2 months ago

I have just added SIGTERM support (cb9657ce42f88304337ab5c177bb42889f127db3).. Now, OME will be terminated via the docker stop command.

@naanlizard When OME receives the SIGTERM signal, all streams that are being recorded will also be cleaned up. Therefore, when you terminate OME, please make sure to trigger the SIGTERM signal!

naanlizard commented 2 months ago

I have just added SIGTERM support (cb9657c).. Now, OME will be terminated via the docker stop command.

@naanlizard When OME receives the SIGTERM signal, all streams that are being recorded will also be cleaned up. Therefore, when you terminate OME, please make sure to trigger the SIGTERM signal!

Wonderful! If I just do 'docker stop omecontainer' with the next release version, will that be ok?

dimiden commented 2 months ago

@naanlizard Yes, that's correct!

naanlizard commented 2 months ago

@naanlizard Yes, that's correct!

Can't wait! When do we expect the next release?

getroot commented 1 month ago

@naanlizard this feature has landed in version 0.17.1