AirenSoft / OvenMediaEngine

OvenMediaEngine (OME) is a Sub-Second Latency Live Streaming Server with Large-Scale and High-Definition. #WebRTC #LLHLS
https://airensoft.com/ome.html
GNU Affero General Public License v3.0
2.57k stars 1.06k forks source link

Cuda error #1719

Open GeghamSimonyan opened 1 week ago

GeghamSimonyan commented 1 week ago

Describe the bug CUDA_ERROR_NO_DEVICE

To Reproduce Steps to reproduce the behavior: Create 2 streams with ABR, and 4 quality layers RESTARTING CONTAINER SOLVE THE PROBLEM

Expected behavior Sometimes when 1 stream is transcoded , publishing second stream give this error

Logs

[2024-10-12 10:02:35.596] I [InboundWorker:50] MediaRouter | mediarouter_application.cpp:481  | [#default#app/ffmpeg(116)] Stream has been prepared 
[Stream Info]
id(116), msid(0), output(ffmpeg), SourceType(Rtmp), RepresentationType(Source), Created Time (Sat Oct 12 10:02:35 2024) UUID(a58d0238-4fe3-4d36-a571-d44653420c37/default/#default#app/ffmpeg/i)

        Video Track #0: Public Name(Video_0) Variant Name(Video) Bitrate(3.50Mb) Codec(1,H264,none:0) BSF(AVCC) Resolution(1920x1080) Framerate(30.00) KeyInterval(0/frame) SkipFrames(-1) BFrames(0) timebase(1/1000)
        Audio Track #1: Public Name(Audio_1) Variant Name(Audio) Bitrate(125.00Kb) Codec(6,AAC,none) BSF(AAC_RAW) Samplerate(44.1K) Format(s16, 16) Channel(stereo, 2) timebase(1/1000)
        Data  Track #2: Public Name(Data_2) Variant Name(Data) Codec(0,Unknown,none) BSF(ID3v2) timebase(1/1000)
[2024-10-12 10:02:35.598] I [InboundWorker:50] Transcoder | transcoder_stream.cpp:184  | [#default#app/ffmpeg(116)] Using local output profiles by webhook
[2024-10-12 10:02:35.600] I [InboundWorker:50] Transcoder | transcoder_stream.cpp:492  | [#default#app/ffmpeg(116)] Output stream has been created. [#default#app/ffmpeg(2658575643)]
[2024-10-12 10:02:35.600] I [InboundWorker:50] MediaRouter | mediarouter_application.cpp:342  | [#default#app/ffmpeg(2658575643)] Trying to create a stream
[2024-10-12 10:02:35.600] I [InboundWorker:50] Monitor | application_metrics.cpp:58   | Create StreamMetrics(ffmpeg/a58d0238-4fe3-4d36-a571-d44653420c37/default/#default#app/ffmpeg/o) for monitoring
[2024-10-12 10:02:35.600] I [InboundWorker:50] MediaRouter | mediarouter_application.cpp:442  | [#default#app/ffmpeg(2658575643)] Stream has been created
[2024-10-12 10:02:35.600] I [InboundWorker:50] Publisher | application.cpp:68   | Stream(ffmpeg/2658575643) created on AppWorker (WebRTC / 0)
[2024-10-12 10:02:35.600] I [InboundWorker:50] Publisher | application.cpp:68   | Stream(ffmpeg/2658575643) created on AppWorker (LLHLS / 0)
[2024-10-12 10:02:35.600] I [InboundWorker:50] Publisher | application.cpp:68   | Stream(ffmpeg/2658575643) created on AppWorker (OVT / 0)
[2024-10-12 10:02:35.602] E [InboundWorker:50] FFmpeg | third_parties.cpp:111  | [AVCodecContext: 0x7f14e3ea5000] ctx->cvdl->cuvidGetDecoderCaps(&ctx->caps8) failed
[2024-10-12 10:02:35.602] E [InboundWorker:50] FFmpeg | third_parties.cpp:111  | [AVCodecContext: 0x7f14e3ea5000]  -> CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
[2024-10-12 10:02:35.602] E [InboundWorker:50] FFmpeg | third_parties.cpp:111  | [AVCodecContext: 0x7f14e3ea5000] 
[2024-10-12 10:02:35.602] E [InboundWorker:50] FFmpeg | third_parties.cpp:111  | [AVCodecContext: 0x7f14e3ea5000] ctx->cvdl->cuvidGetDecoderCaps(&ctx->caps10) failed
[2024-10-12 10:02:35.603] E [InboundWorker:50] FFmpeg | third_parties.cpp:111  | [AVCodecContext: 0x7f14e3ea5000]  -> CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
[2024-10-12 10:02:35.603] E [InboundWorker:50] FFmpeg | third_parties.cpp:111  | [AVCodecContext: 0x7f14e3ea5000] 
[2024-10-12 10:02:35.603] E [InboundWorker:50] FFmpeg | third_parties.cpp:111  | [AVCodecContext: 0x7f14e3ea5000] ctx->cvdl->cuvidGetDecoderCaps(&ctx->caps12) failed
[2024-10-12 10:02:35.603] E [InboundWorker:50] FFmpeg | third_parties.cpp:111  | [AVCodecContext: 0x7f14e3ea5000]  -> CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
[2024-10-12 10:02:35.603] E [InboundWorker:50] FFmpeg | third_parties.cpp:111  | [AVCodecContext: 0x7f14e3ea5000] 
[2024-10-12 10:02:35.603] E [InboundWorker:50] Transcoder | decoder_avc_nv.cpp:91   | Could not open codec: h264
[2024-10-12 10:02:35.603] E [InboundWorker:50] Transcoder | transcoder_stream.cpp:820  | [#default#app/ffmpeg(116)] Decoder allocation failed.  InputTrack(0) > Decoder(0)
[2024-10-12 10:02:35.603] I [InboundWorker:50] Transcoder | transcoder_decoder.cpp:242  | The decoder has been created successfully. track(#1) codec(AAC), module(default:0)

Server (please complete the following information): Linux Ubuntu-2204-jammy-amd64-base 5.15.0-122-generic Docker version 24.0.7

Player (please complete the following information):

Additional context Nvidia smi shows that ovenmedia using gpu +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 647250 C ...ovenmediaengine/bin/OvenMediaEngine 1055MiB | +-----------------------------------------------------------------------------------------

NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6

GPU ada 4000

bchah commented 1 week ago

Consumer NVIDIA cards have hard limits on the number of encode sessions but generally the Quadro cards are "unrestricted". However with 4 active transcodes per stream, and depending on the source stream, is it possible that the NVENC chip is hitting capacity and is unable to create the additional 4 sessions? This is not a certain answer, just my thoughts as I have dealt with this capacity issue in other transcoding scenarios.

GeghamSimonyan commented 1 week ago

Ada 4000 doesn't have limit on enc/dec :((