AirenSoft / OvenMediaEngine

OvenMediaEngine (OME) is a Sub-Second Latency Live Streaming Server with Large-Scale and High-Definition. #WebRTC #LLHLS
https://OvenMediaEngine.com/ome
GNU Affero General Public License v3.0
2.53k stars 1.06k forks source link

High CPU usage for ~10 seconds during RTSP pulled stream establishment and GPU hardware acceleration turned on #940

Closed rebound-software closed 1 year ago

rebound-software commented 1 year ago

Describe the bug I am getting CPU utilisation going to 100% during the initial establishment of a streamed connection to an RTSP pulled camera source with hardware acceleration enabled. This seams to last for around ~10 seconds before returning to expected levels (~15%). The establishment of the camera stream is also noticeably quite slow as a result in OvenPlayer and sometimes fails to play at all (this is because OME is timing out due to execution time starvation caused by the excessive CPU load).

Using top -H -p as recommended in the performance tuning section, I can see the Dech264NV Thread is maxing out to 100% before dropping down to about 7% where it then consistently remains while the stream plays. Using nvidea-smi, I can see the GPU usage never gets above 10% and seems to drop to 0% while the CPU is at 100% during the initial 10 second period.

If I turn off GPU hardware acceleration, the 100% cpu usage does not occur and the stream establishes and plays in OvenPlayer pretty quickly (as expected in around 2 seconds). CPU usage is around 32% for Dech264.

As previously reported, earlier versions of OME don't exhibit the 100% cpu with hw accel enabled.

To Reproduce Enable GPU hardware acceleration and view a WebRTC stream via OvenPlayer for a RTSP pulled source (IP camera in my case)

Expected behavior CPU usage doe not max out to 100% during inital 10 second period.

Server (please complete the following information):

Player (please complete the following information):

Additional context OvenPlayer to view stream on another device than OvenMediaEngine server.

Output from nvidia-smi: `` +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A | | 0% 55C P2 21W / 100W | 836MiB / 3910MiB | 8% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1234 G /usr/lib/xorg/Xorg 18MiB | | 0 N/A N/A 1339 G /usr/bin/gnome-shell 74MiB | | 0 N/A N/A 1712 G /usr/lib/xorg/Xorg 259MiB | | 0 N/A N/A 1845 G /usr/bin/gnome-shell 78MiB | | 0 N/A N/A 2851 G ...794304095522014205,131072 34MiB | | 0 N/A N/A 7612 G ...RendererForSitePerProcess 9MiB | | 0 N/A N/A 8410 C /usr/bin/OvenMediaEngine 354MiB | +-----------------------------------------------------------------------------+ ``

getroot commented 1 year ago

Please upload the entire Server.xml file and the entire log file so I can reproduce thie issue.

rebound-software commented 1 year ago

Server.xml.txt

System Activity with GPU hardware acceleration on: Screenshot CPU HW Transcoding

System Activity with GPU hardware acceleration off: Screenshot CPU SW Transcoding ovenmediaengine_transc_sw_cpu.log ovenmediaengine_transc_high_cpu.log

rebound-software commented 1 year ago

RTSP camera is available at same the url as before if you need to access it for testing.

krakow10 commented 1 year ago

Could this be the same root cause as #901? ffmpeg version upgrade causing different startup behaviour.

rebound-software commented 1 year ago

@krakow10 Thanks for the tip-off it is the move to FFMpeg v5.0.1 that has caused this issue.

Reverting to v4.4.1 in v0.14.14 of OME and the CPU usage is as expected.

Also tried v5.0.2 and v5.1.1 versions of FFMpeg and both exhibit the same problem (v5.1.1 seems slightly better but still nowhere near v4.4.1).

rrauf commented 1 year ago

Is it something that will be addressed in a future release?

Keukhan commented 1 year ago

@mrw-s

I'm busy with other projects to make money at the end of last year, so now I'm trying to solve this problem.

I tried to reproduce it with my own RTSP live stream. However, even using the same version of FFmpeg, the CPU utilization is not going to 100%. It seems that the problem occurs depending on the RTSP stream. Could you please provide an RTSP URL that reproduces the problem? I think it will be very helpful in analyzing the cause.

Thanks.

rebound-software commented 1 year ago

I will set this up again and let you know the details once I’ve done it.

On 24 Jan 2023, at 15:43, Keukhan Kwon @.**@.>> wrote:

@mrw-shttps://github.com/mrw-s

I'm busy with other projects to make money at the end of last year, so now I'm trying to solve this problem.

I tried to reproduce it with my own RTSP live stream. However, even using the same version of FFmpeg, the CPU utilization is not going to 100%. It seems that the problem occurs depending on the RTSP stream. Could you please provide an RTSP URL that reproduces the problem? I think it will be very helpful in analyzing the cause.

Thanks.

— Reply to this email directly, view it on GitHubhttps://github.com/AirenSoft/OvenMediaEngine/issues/940#issuecomment-1402162870, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI3XBIOS3OBAKAPGGDUZLNTWT72CTANCNFSM6AAAAAARYTWYLQ. You are receiving this because you were mentioned.Message ID: @.***>

Keukhan commented 1 year ago

@mrw-s

Okay. I look forward to your reply.

thank you.

rebound-software commented 1 year ago

Sorry for the delay … also been busy on paid work!

Camera is now setup and live. Connection details sent to support email address.

On 24 Jan 2023, at 16:02, Keukhan Kwon @.**@.>> wrote:

@mrw-shttps://github.com/mrw-s

Okay. I look forward to your reply.

thank you.

— Reply to this email directly, view it on GitHubhttps://github.com/AirenSoft/OvenMediaEngine/issues/940#issuecomment-1402191407, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI3XBIJXP7BO5EB6WEW2W6DWT74I3ANCNFSM6AAAAAARYTWYLQ. You are receiving this because you were mentioned.Message ID: @.***>

Keukhan commented 1 year ago

@mrw-s I'm sorry. I checked your email late. Fortunately, I can access the URL you sent. I will analyze the cause and give you a reply. thanks for providing a test environment.

thank you.

Keukhan commented 1 year ago

@mrw-s

I tried to reproduce with the same OS, the same OME version, the same Server.xml settings, and the same streaming URL, but I didn't see the CPU going to 100%. Thankfully you provided me with a sample url, but I couldn't analyze the cause. The expected problem is that hardware decoder allocation seems to take about 4 seconds. So, it takes about 8 seconds to assign 2 decoders.

I guess I'll have to analyze the cause on the same hardware, and it seems difficult to fix right away. I will try to solve it in a long time.

Please wait for good news.

Thanks

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Keukhan commented 1 year ago

@rebound-software

I have reanalyzed this issue after a long time. I tried pulling an RTSP URL and encoding it with GPU, but the CPU does not reach 100% usage for a few seconds. For now, let's close this issue, and if anyone manages to reproduce it, please create a new issue. Thank you.

image

rebound-software commented 1 year ago

Is that with the current code, or the code at the time of the issue?

On 26 Jun 2023, at 14:25, Keukhan Kwon @.***> wrote:

@rebound-softwarehttps://github.com/rebound-software

I have reanalyzed this issue after a long time. I tried pulling an RTSP URL and encoding it with GPU, but the CPU does not reach 100% usage for a few seconds. For now, let's close this issue, and if anyone manages to reproduce it, please create a new issue. Thank you.

[image]https://user-images.githubusercontent.com/21077363/248813186-f86ed011-1e4f-4e6f-8fcd-bd183379da53.png

— Reply to this email directly, view it on GitHubhttps://github.com/AirenSoft/OvenMediaEngine/issues/940#issuecomment-1607471330, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI3XBIKVEH736I3UVRJSYWTXNGES3ANCNFSM6AAAAAARYTWYLQ. You are receiving this because you were mentioned.Message ID: @.***>

Keukhan commented 1 year ago

@rebound-software

Testing is using the latest code. :)