AirenSoft / OvenMediaEngine

OvenMediaEngine (OME) is a Sub-Second Latency Live Streaming Server with Large-Scale and High-Definition. #WebRTC #LLHLS
https://airensoft.com/ome.html
GNU Affero General Public License v3.0
2.57k stars 1.06k forks source link

Oven Media Engine not starting , Build using Dockerfile.cuda.local to enable hardware acceleration #1508

Closed chandrashekar-nallamilli closed 8 months ago

chandrashekar-nallamilli commented 9 months ago

Describe the bug I am using srt ingest to webrtc playback on ovenmedia using docker . Trying to enable Hardware acceleration

To Reproduce Steps to reproduce the behavior:

  1. Set Server.xml as follows '<?xml version="1.0" encoding="UTF-8" ?>
OvenMediaEngine origin * false stun.ovenmediaengine.com:13478 true false false 2 ${env:OME_API_PORT:8081} 1 ${env:OME_SRT_PROV_PORT:9999} 2 ${env:OME_WEBRTC_SIGNALLING_PORT:3333} ${env:OME_WEBRTC_SIGNALLING_TLS_PORT:3334} 1 ${env:OME_HOST_IP:*}:${env:OME_WEBRTC_CANDIDATE_PORT:10000-10004/udp} ${env:OME_HOST_IP:*}:${env:OME_WEBRTC_TCP_RELAY_PORT:3478} true 1 ${env:OME_WEBRTC_SIGNALLING_PORT:3333} ${env:OME_WEBRTC_SIGNALLING_TLS_PORT:3334} 1 ${env:OME_WEBRTC_CANDIDATE_IP:*}:${env:OME_WEBRTC_CANDIDATE_PORT:10000-10004/udp} ${env:OME_WEBRTC_CANDIDATE_IP:*}:${env:OME_WEBRTC_TCP_RELAY_PORT:3478} true 1 Demo * app live true nv true nv bypass_stream ${OriginStreamName} 30000 * 1 8 30000 false false false

'

  1. With Encoder 'Vmix [e.g. Use OBS Version]
  2. See error

Expected behavior A clear and concise description of what you expected to happen. The gpu should be used for hardware acceleration but i cant start the images

Logs /opt/ovenmediaengine/bin/OvenMediaEngine: error while loading shared libraries: libnppig.so.11: cannot open shared object file: No such file or directory /opt/ovenmediaengine/bin/OvenMediaEngine: error while loading shared libraries: libnppig.so.11: cannot open shared object file: No such file or directory /opt/ovenmediaengine/bin/OvenMediaEngine: error while loading shared libraries: libnppig.so.11: cannot open shared object file: No such file or directory

Server (please complete the following information):

Player (please complete the following information):

d-uzlov commented 8 months ago

I also see this error.

My system specs:

I built the container like this:

docker build https://github.com/AirenSoft/OvenMediaEngine/raw/v0.16.4/Dockerfile.cuda \
    --build-arg OME_VERSION=v0.16.4 \
    -t docker.io/$docker_username/$docker_repo:ome-official-v0.16.4-cuda11

I also see that my system doesn't have libnppig.so.11 anywhere, but there is /usr/local/cuda-12.3/targets/x86_64-linux/lib/libnppig.so.12. So I tried building the container with CUDA 12 (I modified the Dockerfile to use CUDA 12.2.0 instead of 11.4.3):

docker build - < ./video/ome/nvidia/Dockerfile.cuda \
    --build-arg OME_VERSION=v0.16.4 \
    -t docker.io/$docker_username/$docker_repo:ome-official-v0.16.4-cuda12

Just in case, I made sure to run the container with NVIDIA_DRIVER_CAPABILITIES=all.

Unfortunately, the error is the same, except the file name is a bit different:

/opt/ovenmediaengine/bin/OvenMediaEngine: error while loading shared libraries: libnppig.so.12: cannot open shared object file: No such file or directory

Version 1.16.3 didn't have this issue.

Keukhan commented 8 months ago

Thank you for reporting. I'm sorry for my late reply.

Recently, I changed the code to use the libnpp library in OME.

If you have time, could you test it by changing the CUDA build script as shown below and let me know the results?

Dockerfile.cuda

ENV     NVIDIA_DRIVER_CAPABILITIES=compute,utility,video,compat32

=>

ENV     NVIDIA_DRIVER_CAPABILITIES=compute,utility,video,compat32,graphics

The above code binds the NVIDIA library on the host when running Docker Image. It is expected that the CAPABILITIES property value called graphics will bind the libnppig.so file.

Then I'll wait for good news. Thanks for your help.

Reference : https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.10.0/user-guide.html

d-uzlov commented 8 months ago

could you test it by changing the CUDA build script as shown below and let me know the results?

Unfortunately, it doesn't seem to change anything.

root@ovenmediaengine-7958d6df66-8s22j:/opt/ovenmediaengine/bin# env | grep NVIDIA_DRIVER_CAPABILITIES
NVIDIA_DRIVER_CAPABILITIES=compute,utility,video,compat32,graphics
root@ovenmediaengine-7958d6df66-8s22j:/opt/ovenmediaengine/bin# /opt/ovenmediaengine/bin/OvenMediaEngine -c origin_conf
/opt/ovenmediaengine/bin/OvenMediaEngine: error while loading shared libraries: libnppig.so.12: cannot open shared object file: No such file or directory
d-uzlov commented 8 months ago

I tried to check what NVIDIA_DRIVER_CAPABILITIES actually does to a container. It doesn't seem to change the list of available libraries.

A few examples ```bash root@ovenmediaengine-7cdf5d7c9b-cm275:/opt/ovenmediaengine/bin# env | grep NVIDIA_DRIVER_CAPABILITIES NVIDIA_DRIVER_CAPABILITIES= root@ovenmediaengine-7cdf5d7c9b-cm275:/opt/ovenmediaengine/bin# ls -la /usr/local/cuda/targets/x86_64-linux/lib/ /usr/local/cuda/compat/ /usr/local/cuda/compat/: total 145576 drwxr-xr-x 2 root root 4096 Nov 10 04:57 . drwxr-xr-x 4 root root 4096 Nov 10 04:57 .. lrwxrwxrwx 1 root root 12 Oct 19 23:47 libcuda.so -> libcuda.so.1 lrwxrwxrwx 1 root root 21 Oct 19 23:47 libcuda.so.1 -> libcuda.so.535.129.03 -rw-r--r-- 1 root root 29372496 Oct 19 19:06 libcuda.so.535.129.03 lrwxrwxrwx 1 root root 29 Oct 19 23:47 libcudadebugger.so.1 -> libcudadebugger.so.535.129.03 -rw-r--r-- 1 root root 10188744 Oct 19 18:31 libcudadebugger.so.535.129.03 lrwxrwxrwx 1 root root 19 Oct 19 23:47 libnvidia-nvvm.so -> libnvidia-nvvm.so.4 lrwxrwxrwx 1 root root 28 Oct 19 23:47 libnvidia-nvvm.so.4 -> libnvidia-nvvm.so.535.129.03 -rw-r--r-- 1 root root 86140736 Oct 19 19:49 libnvidia-nvvm.so.535.129.03 lrwxrwxrwx 1 root root 38 Oct 19 23:47 libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.535.129.03 -rw-r--r-- 1 root root 23348992 Oct 19 19:12 libnvidia-ptxjitcompiler.so.535.129.03 /usr/local/cuda/targets/x86_64-linux/lib/: total 680 drwxr-xr-x 2 root root 4096 Nov 10 04:57 . drwxr-xr-x 3 root root 4096 Nov 10 04:57 .. lrwxrwxrwx 1 root root 20 May 1 2023 libcudart.so.12 -> libcudart.so.12.2.53 -rw-r--r-- 1 root root 687456 May 1 2023 libcudart.so.12.2.53 ``` ```bash root@ovenmediaengine-665476f9cf-5cx4w:/opt/ovenmediaengine/bin# env | grep NVIDIA_DRIVER_CAPABILITIES NVIDIA_DRIVER_CAPABILITIES=video root@ovenmediaengine-665476f9cf-5cx4w:/opt/ovenmediaengine/bin# ls -la /usr/local/cuda/targets/x86_64-linux/lib/ /usr/local/cuda/compat/ /usr/local/cuda/compat/: total 145576 drwxr-xr-x 2 root root 4096 Nov 10 04:57 . drwxr-xr-x 4 root root 4096 Nov 10 04:57 .. lrwxrwxrwx 1 root root 12 Oct 19 23:47 libcuda.so -> libcuda.so.1 lrwxrwxrwx 1 root root 21 Oct 19 23:47 libcuda.so.1 -> libcuda.so.535.129.03 -rw-r--r-- 1 root root 29372496 Oct 19 19:06 libcuda.so.535.129.03 lrwxrwxrwx 1 root root 29 Oct 19 23:47 libcudadebugger.so.1 -> libcudadebugger.so.535.129.03 -rw-r--r-- 1 root root 10188744 Oct 19 18:31 libcudadebugger.so.535.129.03 lrwxrwxrwx 1 root root 19 Oct 19 23:47 libnvidia-nvvm.so -> libnvidia-nvvm.so.4 lrwxrwxrwx 1 root root 28 Oct 19 23:47 libnvidia-nvvm.so.4 -> libnvidia-nvvm.so.535.129.03 -rw-r--r-- 1 root root 86140736 Oct 19 19:49 libnvidia-nvvm.so.535.129.03 lrwxrwxrwx 1 root root 38 Oct 19 23:47 libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.535.129.03 -rw-r--r-- 1 root root 23348992 Oct 19 19:12 libnvidia-ptxjitcompiler.so.535.129.03 /usr/local/cuda/targets/x86_64-linux/lib/: total 680 drwxr-xr-x 2 root root 4096 Nov 10 04:57 . drwxr-xr-x 3 root root 4096 Nov 10 04:57 .. lrwxrwxrwx 1 root root 20 May 1 2023 libcudart.so.12 -> libcudart.so.12.2.53 -rw-r--r-- 1 root root 687456 May 1 2023 libcudart.so.12.2.53 ``` ```bash root@ovenmediaengine-5cbf748884-l8tgl:/opt/ovenmediaengine/bin# env | grep NVIDIA_DRIVER_CAPABILITIES NVIDIA_DRIVER_CAPABILITIES=compute,utility,video,compat32,graphics root@ovenmediaengine-5cbf748884-l8tgl:/opt/ovenmediaengine/bin# ls -la /usr/local/cuda/targets/x86_64-linux/lib/ /usr/local/cuda/compat/ /usr/local/cuda/compat/: total 145576 drwxr-xr-x 2 root root 4096 Nov 10 04:57 . drwxr-xr-x 4 root root 4096 Nov 10 04:57 .. lrwxrwxrwx 1 root root 12 Oct 19 23:47 libcuda.so -> libcuda.so.1 lrwxrwxrwx 1 root root 21 Oct 19 23:47 libcuda.so.1 -> libcuda.so.535.129.03 -rw-r--r-- 1 root root 29372496 Oct 19 19:06 libcuda.so.535.129.03 lrwxrwxrwx 1 root root 29 Oct 19 23:47 libcudadebugger.so.1 -> libcudadebugger.so.535.129.03 -rw-r--r-- 1 root root 10188744 Oct 19 18:31 libcudadebugger.so.535.129.03 lrwxrwxrwx 1 root root 19 Oct 19 23:47 libnvidia-nvvm.so -> libnvidia-nvvm.so.4 lrwxrwxrwx 1 root root 28 Oct 19 23:47 libnvidia-nvvm.so.4 -> libnvidia-nvvm.so.535.129.03 -rw-r--r-- 1 root root 86140736 Oct 19 19:49 libnvidia-nvvm.so.535.129.03 lrwxrwxrwx 1 root root 38 Oct 19 23:47 libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.535.129.03 -rw-r--r-- 1 root root 23348992 Oct 19 19:12 libnvidia-ptxjitcompiler.so.535.129.03 /usr/local/cuda/targets/x86_64-linux/lib/: total 680 drwxr-xr-x 2 root root 4096 Nov 10 04:57 . drwxr-xr-x 3 root root 4096 Nov 10 04:57 .. lrwxrwxrwx 1 root root 20 May 1 2023 libcudart.so.12 -> libcudart.so.12.2.53 -rw-r--r-- 1 root root 687456 May 1 2023 libcudart.so.12.2.53 ``` ```bash root@ovenmediaengine-7f8fb6d678-bdzdd:/opt/ovenmediaengine/bin# env | grep NVIDIA_DRIVER_CAPABILITIES NVIDIA_DRIVER_CAPABILITIES=utility root@ovenmediaengine-7f8fb6d678-bdzdd:/opt/ovenmediaengine/bin# ls -la /usr/local/cuda/targets/x86_64-linux/lib/ /usr/local/cuda/compat/ /usr/local/cuda/compat/: total 145576 drwxr-xr-x 2 root root 4096 Nov 10 04:57 . drwxr-xr-x 4 root root 4096 Nov 10 04:57 .. lrwxrwxrwx 1 root root 12 Oct 19 23:47 libcuda.so -> libcuda.so.1 lrwxrwxrwx 1 root root 21 Oct 19 23:47 libcuda.so.1 -> libcuda.so.535.129.03 -rw-r--r-- 1 root root 29372496 Oct 19 19:06 libcuda.so.535.129.03 lrwxrwxrwx 1 root root 29 Oct 19 23:47 libcudadebugger.so.1 -> libcudadebugger.so.535.129.03 -rw-r--r-- 1 root root 10188744 Oct 19 18:31 libcudadebugger.so.535.129.03 lrwxrwxrwx 1 root root 19 Oct 19 23:47 libnvidia-nvvm.so -> libnvidia-nvvm.so.4 lrwxrwxrwx 1 root root 28 Oct 19 23:47 libnvidia-nvvm.so.4 -> libnvidia-nvvm.so.535.129.03 -rw-r--r-- 1 root root 86140736 Oct 19 19:49 libnvidia-nvvm.so.535.129.03 lrwxrwxrwx 1 root root 38 Oct 19 23:47 libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.535.129.03 -rw-r--r-- 1 root root 23348992 Oct 19 19:12 libnvidia-ptxjitcompiler.so.535.129.03 /usr/local/cuda/targets/x86_64-linux/lib/: total 680 drwxr-xr-x 2 root root 4096 Nov 10 04:57 . drwxr-xr-x 3 root root 4096 Nov 10 04:57 .. lrwxrwxrwx 1 root root 20 May 1 2023 libcudart.so.12 -> libcudart.so.12.2.53 -rw-r--r-- 1 root root 687456 May 1 2023 libcudart.so.12.2.53 ``` Running `docker.io/alpine:3.17.3` with nvidia container runtime: ```bash / # env | grep NVIDIA_DRIVER_CAPABILITIES NVIDIA_DRIVER_CAPABILITIES=all / # ls -la /usr/local/cuda/targets/x86_64-linux/lib/ /usr/local/cuda/compat/ ls: /usr/local/cuda/targets/x86_64-linux/lib/: No such file or directory ls: /usr/local/cuda/compat/: No such file or directory / # ls -la /usr/local/ total 24 drwxr-xr-x 5 root root 4096 Mar 29 2023 . drwxr-xr-x 1 root root 4096 Feb 12 20:15 .. drwxr-xr-x 2 root root 4096 Mar 29 2023 bin drwxr-xr-x 2 root root 4096 Mar 29 2023 lib drwxr-xr-x 2 root root 4096 Mar 29 2023 share / # nvidia- nvidia-cuda-mps-control nvidia-cuda-mps-server nvidia-debugdump nvidia-persistenced nvidia-smi / # nvidia-smi sh: nvidia-smi: not found ```

The list of CUDA libraries stays the same regardless of NVIDIA_DRIVER_CAPABILITIES. nvidia-smi is not available when env doesn't contain utility. Maybe there are some other changes but I don't see them in the file system. Also, when I tried running pure alpine container, libraries are not injected at all. I guess they are only present in OME container because it is using nvidia/cuda:12.2.0-base-ubuntu20.04 as base? Maybe nvidia container runtime replaces existing stub libraries with proper libraries, but doesn't add new ones?

I can't quite find any details on this in google. Maybe there is something wrong with my container runtime setup? I can't tell. I did setup the container runtime using instructions from nvidia website, and I re-checked that everything is still alright. OME 1.16.3 runs just fine with hardware acceleration (well, except thumbnail generation is not working, but it's not related to this issue). Also, nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2, which nvidia suggests to use to test if CUDA is available in container, runs without any issues.

Keukhan commented 8 months ago

@d-uzlov Thank you for testing in detail. I will fix the problem of not being able to bind libnppig.so file to Docker Container. I think it will take a few days. Please understand. I will contact you when the problem is resolved.

Keukhan commented 8 months ago

@d-uzlov

I've finally solved it due to my busy work schedule. The problem of not finding the libnppig.so file has been corrected as follows.

https://github.com/AirenSoft/OvenMediaEngine/compare/f38d3c1a180e...4b297cc97fbc

You can change Docker's base image from base -> runtime in the Dockerfile.cuda and Dockerfile.cudal.ocal scripts. Please test it and let me know the results.

I hope you have a nice day.

chandrashekar-nallamilli commented 8 months ago

Thank you @Keukhan for looking into the issue , @d-uzlov for testing it . I can confirm its is working after making the suggested changes from our side Have a nice day

d-uzlov commented 8 months ago

The problem is solved when using runtime cuda images. It's a bit unfortunate that image size goes from ~500 MB to ~2500 MB (uncompressed). I guess it's not too big of a deal but maybe it would be possible to optimize it in the future. Thank you for solving this.