Kurento / bugtracker

[ARCHIVED] Contents migrated to monorepo: https://github.com/Kurento/kurento
46 stars 10 forks source link

h264 high CPU usage #433

Open githubbla opened 4 years ago

githubbla commented 4 years ago

h264 high cpu, no transcoding (no vp8 in config) on dedicated 2 vCPUs with 4gb ram server! stream 12 x streams rtsp connected with pipeline to 1 webrtc client (just one stream connected to 1 webrtc client- model like TV, to fast channel switching short session like 5min) - after 5 hours high cpu. I tried same on less powerful server, it's this issue happened just like in 20min. streams are 640x480 h264 with variable bandwidth, basic like 300kbit, raised sometime to 1.5mbit

Screen Shot 2020-01-28 at 22 22 19

in log i see many of

2020-01-28T15:26:52,968210 3647 0x00007efe45de0700 warning rtpjitterbuffer rtpjitterbuffer.c:785 rtp_jitter_buffer_insert() duplicate packet 64395 found 2020-01-28T15:26:52,968385 3647 0x00007efe45de0700 warning rtpjitterbuffer rtpjitterbuffer.c:785 rtp_jitter_buffer_insert() duplicate packet 64396 found 2020-01-28T15:26:52,968508 3647 0x00007efe45de0700 warning rtpjitterbuffer rtpjitterbuffer.c:785 rtp_jitter_buffer_insert() duplicate packet 64397 found 2020-01-28T15:26:53,004841 3647 0x00007efe45de0700 warning rtpjitterbuffer rtpjitterbuffer.c:785 rtp_jitter_buffer_insert() duplicate packet 64398 found 2020-01-28T15:26:53,005708 3647 0x00007efe45de0700 warning rtpjitterbuffer rtpjitterbuffer.c:785 rtp_jitter_buffer_insert() duplicate packet 64399 found 2020-01-28T15:26:53,005924 3647 0x00007efe45de0700 warning rtpjitterbuffer rtpjitterbuffer.c:785 rtp_jitter_buffer_insert() duplicate packet 64400 found 2020-01-28T15:26:53,022072 3647 0x00007efe24af4700 warning kmsutils kmsutils.c:1428 kms_utils_depayloader_adjust_pts_out() Fix PTS not strictly increasing, last: 46:43:31.380979562, current: 20:26:26.112394007, fixed = last + 1: 46:43:31.381979562 2020-01-28T15:26:53,022160 3647 0x00007efe24af4700 warning kmsutils kmsutils.c:1428 kms_utils_depayloader_adjust_pts_out() Fix PTS not strictly increasing, last: 46:43:31.381979562, current: 20:26:26.144416229, fixed = last + 1: 46:43:31.382979562 2020

j1elo commented 4 years ago

Hi,

these messages:

warning rtpjitterbuffer rtpjitterbuffer.c:785 rtp_jitter_buffer_insert() duplicate packet 64395 found
warning kmsutils kmsutils.c:1428 kms_utils_depayloader_adjust_pts_out() Fix PTS not strictly increasing

are due to packet duplication happening in the network, it is relatively OK if there aren't thousands of them, so you can ingore those warnings.

Regarding the CPU usage: how do you create the PlayerEndpoint? There is a constructor option, useEncodedMedia, that disables decoding on the input media:

final PlayerEndpoint playerEndpoint = new PlayerEndpoint.Builder(pipeline, videourl).useEncodedMedia().build();

with this, all compatibility guarantees about the input and output codecs are disabled in Kurento, this means you have to make sure on your own that the source codec is compatible with the consumer (in this case, the consumer is a web browser).

You can read more about useEncodedMedia in the client docs: PlayerEndpoint.Builder useEncodedMedia().

Also please read about Debug Logging and enable the suggested category for Transcoding.

Then, search in your KMS logs in /var/log/kurento-media-server/ for messages about transcoding being enabled. More details here: CPU usage grows too high.

j1elo commented 4 years ago

Also, does the Kurento process crash (and then some service supervisor restarts it)? Or it doesn't crash after those spikes in CPU?

githubbla commented 4 years ago

Hi @j1elo , Thank you for reply. There many of such duplicate messages,it's create like 1.2-1.5Gb of logs for 24h, with default log setting. export GST_DEBUG="3,Kurento:4,kms:4,sdp:4,webrtc:4,rtpendpoint:4,rtphandler:4,rtpsynchronizer:4,agnosticbin:4"

We do not use PlayerEndpoint, we connect rtpEndpoint with webrtcEndpoint. We use similiar technics like in kurento sample https://doc-kurento.readthedocs.io/en/6.13.0/tutorials/java/tutorial-rtp-receiver.html

How we can use useEncodedMedia in our case?

Also, does the Kurento process crash (and then some service supervisor restarts it)? Or it doesn't crash after those spikes in CPU?

it's load going up to 12, and some time after it - it's crash.

I restarted server, and made connection rtsp(13 channels)-> view one rtsp to webrtc.. and here full with cpu spike (i only replaced server IP) and service crashed after.

ku.log

j1elo commented 4 years ago

In that log file, I can see that there are:

Are you sure this is not the Linux kernel, detecting OOM ("out of memory") and killing KMS? Doing 53 recordings at the same time in the same machine seems like a bit too much.

Could you please sketch a quick diagram of your pipeline and explain how it changes over time? (e.g. use draw.io or any other diagramming software)

Also, does the problem happen if you reduce it to a single pipeline like this?

                /--> 1 WebRtcEndpoint
1 RtpEndpoint --|
                \--> 1 RecorderEndpoint
githubbla commented 4 years ago

Hi @j1elo, The log file around 1.4h running until it's killed.

We have 13 RtpEndpoints (rtp1,rtp2,rtp3..) active all connected to one pipeline(it's 5 devices, streaming from 2-3 streams per device), webclients (w1,w2,w3) (webrtc) when rtp* is disconnected - we destroy his RtpEndpoint, and create new rtpendpoint when we want to connect again to pipeline. same for webclient (w1,w2..), when disconnected we destroy his webrtcendpoint, and create new when client want to reconnect. we have one pipeline with 13 rtpendpoints and few webrtcendpoins of webclients. sometimes we create RecorderEndpoints at time we want to recording from some device so it's create 2 or 3 RecorderEndpoints in the same pipeline (with new file names for each RecorderEndpoint), it's recording like 20-60sec. and we destroy RecorderEndpoints for this specific recording.

it's test environment we have just 13 RTPStreams, and just 2 webrtc clients... it's very small load as i think, what can be an issue?

so maximum concurrent usage: 1 pipeline 13 RTPEndpoints (from 5 separate IP) VideoBroadcasters, the possible not perfect quality of the line 2 WebRTCEndpoints (webrtc web client) TV screen 6 RecorderEndpoints (recording to local /tmp on kurento server)

You right, it's killed over OOM issue by kernel. I see virtual memory around 6GB at time of killing for handle 13 RTPEndpoint 300k-1500k each. there many errors"2020-01-29T14:30:40,423591 3286 0x00007ff6887e0700 warning rtpjitterbuffer rtpjitterbuffer.c:735 rtp_jitter_buffer_insert() rtp delta too big, reset skew" possible it's rtpjitterbuffer filled with garbage - how we can restrict rtpjitterbuffer buffer for something like 20ms?

Jan 29 14:31:13 cpu-kurento613 kernel: [22520.474110] kurento-media-s invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jan 29 14:31:13 cpu-kurento613 kernel: [22520.474113] kurento-media-s cpuset=/ mems_allowed=0 Jan 29 14:31:13 cpu-kurento613 kernel: [22520.474118] CPU: 1 PID: 3375 Comm: kurento-media-s Not tainted 4.15.0-74-generic #84-Ubuntu . . . Jan 29 14:31:13 cpu-kurento613 kernel: [22520.474330] Out of memory: Kill process 3286 (kurento-media-s) score 936 or sacrifice child Jan 29 14:31:13 cpu-kurento613 kernel: [22520.476637] Killed process 3286 (kurento-media-s) total-vm:5812204kB, anon-rss:3775672kB, file-rss:0kB, shmem-rss:0kB Jan 29 14:31:13 cpu-kurento613 kernel: [22520.650325] oom_reaper: reaped process 3286 (kurento-media-s), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

I did new test, I took server with 8GB ram! I run same scenario 1 pipeline, 13 RTSPendpoints, 2 webrtcclient... and was OK, around 20 min later (when both 2 webrtcclient was disconnected) load started to rise until 12! after like 10min, it's start writing aggressively in log thousand of such messages... in 25 minutes written 250K lines!

2020-01-30T00:34:31,046176 960 0x00007f2fa9fd3700 warning rtpjitterbuffer rtpjitterbuffer.c:785 rtp_jitter_buffer_insert() duplicate packet 61817 found 2020-01-30T00:34:31,079419 960 0x00007f2fa9fd3700 warning rtpjitterbuffer rtpjitterbuffer.c:785 rtp_jitter_buffer_insert() duplicate packet 61820 found 2020-01-30T00:34:31,112419 960 0x00007f2fa9fd3700 warning rtpjitterbuffer rtpjitterbuffer.c:785 rtp_jitter_buffer_insert() duplicate packet 61821 found

**It's original issue, or it's happened becouse kurento hung CPUs?

after around 30min load back to normal, and kurento fill log with thousands (some 35K for last 7min) of such messages**

2020-01-30T00:53:14,161445 960 0x00007f2f89f93700 warning kmsutils kmsutils.c:1428 kms_utils_depayloader_adjust_pts_out() Fix PTS not strictly increasing, last: 27:17:43.758926306, current: 1:12:43.804268195, fixed = last + 1: 27:17:43.759926306 2020-01-30T00:53:14,193512 960 0x00007f2f89f93700 warning kmsutils kmsutils.c:1428 kms_utils_depayloader_adjust_pts_out() Fix PTS not strictly increasing, last: 27:17:43.759926306, current: 1:12:43.840201528, fixed = last + 1: 27:17:43.760926306

at end of the session, I destroyed and recreated all rtspendpoints, and immediately - it's stop fill the logs with this repeatable messages.

kulog.zip

githubbla commented 4 years ago

@j1elo, can you please advise what to do to workaround this memory overflow?

this is last 2 lines in the log 2020-01-31T01:11:46,206418 16669 0x00007f4db6a94700 warning rtpjitterbuffer rtpjitterbuffer.c:785 rtp_jitter_buffer_insert() duplicate packet 32222 found 2020-01-31T01:11:46,290936 16669 0x00007f4db6a94700 warning rtpjitterbuffer rtpjitterbuffer.c:785 rtp_jitter_buffer_insert() duplicate packet 32223 found

Screen Shot 2020-01-31 at 10 52 18
RobotnickIsrael commented 4 years ago

I'm using RTP endpoint and redirecting udp packets from port X to the port kurento listens to. As a result, I get tons of: warning kmsutils kmsutils.c:1428 kms_utils_depayloader_adjust_pts_out() Fix PTS not strictly increasing

The picture isn't smooth. Do you happen to know how this can be fixed?

j1elo commented 4 years ago

After all this issue maybe was caused by this: https://github.com/Kurento/kms-core/pull/22

Please test again with the new Kurento 6.14.0: https://doc-kurento.readthedocs.io/en/latest/user/installation.html

and let us know if the problem persists.

ywcai commented 4 years ago

I think , i meet the same problem on my Issues #492 .

josephmiller2000 commented 3 years ago

Well im getting this new error, and Kurento Crashed after running for 2 whole month, I have high end server with 12 core cpu server, so it could manage a bit of heavy load. But eventually its not enough and crashed.

"Fix PTS not strictly increasing"

Snap_Shot_02370