Kurento / bugtracker

[ARCHIVED] Contents migrated to monorepo: https://github.com/Kurento/kurento
46 stars 10 forks source link

Crash, high CPU and memory consumption during long term test #512

Open MaxIasat opened 4 years ago

MaxIasat commented 4 years ago

Prerequisites

Issue description

Application: My application is similar to “one-to-many” tutorial, but instead of the presenter i have used PlayerEndpoint to stream the video files (mkv files). As soon as one client is connected to the application server the streaming will be started. Other clients are connected to the running stream. Some kind of business logic selects in the infinite loop the video media files and triggers the creation of the new PlayerEndpoint and then the playing the new video file. Before it, the old one PlayerEndpoint has been released. The existing WebRtcEndpoints will be connected to the new PlayerEndpoint. All the clients show the same video file.

So, i have one pipeline, one player endpoint and many WebRtcEndpoints connected (in my case i have only max 5 clients connected). The files are provided with file:// attribute to the PlayerEndpoint.

Pipeline --> PlayerEndpoint --> WebRtcEndpoint 1
                        |   --> WebRtcEndpoint 2 …

The video files have duration of 30 seconds. Every 30 s the business logic triggers the new video file. I have tested kurento 6.14.0 and 6.14.0-asan versions installed in docker on Debian 8 machine with 4 cores CPU and 4 GB RAM. Mkv video files have 720p resolution and coded as H.264 with 25fps. I have disabled the using of VP8 codec in kurento. So, only H.264 codec can be chosen. In this case the video files are not transcoded. I log this information also in my application by subscribing on MediaTranscodingStateChange event. The audio is disabled. The video files don’t have any audio tracks.

General behaviour: The general behaviour is as expected. As soon one client connects to the server the streaming is started and each 30 seconds, i see new video file. Other connected clients see immediately the running stream. The stream runs synchronously. From functionality point of view, it works great.

Long term behaviour kurento 6.14.0 In the above-mentioned configuration, the system runs stable couple of hours (ca. 8h). At the beginning I noticed cpu usage for kurento-media-server pending about 90% (with top tool). After some time I observed some kind of CPU spikes (about 200%). Finally, after ca. 8h the cpu usage was huge and the system didn’t respond. Only reboot could help. I noticed also the memory increasing over the time (with top tool).

Long term behaviour kurento 6.14.0-asan So, for further analysing (maybe memory leaks) I used the asan version of the kurento as docker with the same configuration described above. I was very surprising, but this asan version were running more than 3 days on my system. But the kurento media server crashed! Here are the last log lines:

{"log":"63:45:14.991258108     9 0x61d0090e0630 WARN          dtlsconnection gstdtlsconnection.c:312:handle_timeout:\u003cGstDtlsConnection@0x621006c38460\u003e handling timeout failed\n","stream":"stderr","time":"2020-07-30T22:24:26.902977789Z"}
{"log":"63:45:25.128369440     9 0x61d0090e04f0 WARN          dtlsconnection gstdtlsconnection.c:312:handle_timeout:\u003cGstDtlsConnection@0x621006a43560\u003e handling timeout failed\n","stream":"stderr","time":"2020-07-30T22:24:26.902996642Z"}
{"log":"63:45:25.128247109     9 0x61d0090e0540 WARN          dtlsconnection gstdtlsconnection.c:312:handle_timeout:\u003cGstDtlsConnection@0x621006d06ae0\u003e handling timeout failed\n","stream":"stderr","time":"2020-07-30T22:24:26.903015638Z"}
{"log":"63:45:40.736984796     9 0x61d008b848a0 WARN          dtlsconnection gstdtlsconnection.c:312:handle_timeout:\u003cGstDtlsConnection@0x621006c381e0\u003e handling timeout failed\n","stream":"stderr","time":"2020-07-30T22:24:26.903034851Z"}
{"log":"63:45:40.736981220     9 0x61d0090e0400 WARN          dtlsconnection gstdtlsconnection.c:312:handle_timeout:\u003cGstDtlsConnection@0x621006b70560\u003e handling timeout failed\n","stream":"stderr","time":"2020-07-30T22:24:26.903053751Z"}
{"log":"63:45:45.413900652     9 0x619009775890 WARN       KurentoWorkerPool WorkerPool.cpp:122:checkThreads: Worker thread pool is lagging! (CPU exhausted?)\n","stream":"stderr","time":"2020-07-30T22:24:26.903083716Z"}
{"log":"63:45:47.777772174     9 0x61d008f360f0 WARN          dtlsconnection gstdtlsconnection.c:312:handle_timeout:\u003cGstDtlsConnection@0x6210075a2470\u003e handling timeout failed\n","stream":"stderr","time":"2020-07-30T22:24:26.903102773Z"}
{"log":"63:45:47.777753526     9 0x61d0090e04f0 WARN          dtlsconnection gstdtlsconnection.c:312:handle_timeout:\u003cGstDtlsConnection@0x621006ed67a0\u003e handling timeout failed\n","stream":"stderr","time":"2020-07-30T22:24:26.90313725Z"}
{"log":"/entrypoint.sh: line 84:     9 Killed                  LD_PRELOAD=\"$PWD/libasan.so\" ASAN_OPTIONS='detect_odr_violation=0 detect_leaks=1 detect_invalid_pointer_pairs=2 strict_string_checks=1 detect_stack_use_after_return=0 check_initialization_order=1 strict_init_order=1' LD_LIBRARY_PATH=\"$PWD\" ./kurento-media-server --modules-path=\"$PWD\" --gst-plugin-path=\"$PWD\"\n","stream":"stderr","time":"2020-07-30T22:24:27.647452597Z"}

This behaviour I could reproduce twice. The container with the crashed kurento inside I can provide as a tar file. Can you please help with analysing of this issue? Thank you and best regards

Context

I try to implement a video player which stream different videos every 30 seconds.

How to reproduce?

Do long term test with PlayerEndpoint and many clients by streaming different files in the infinite loop. See description above.

Expected & current behavior

Stable system without crashes.

(Optional) Possible solution

I suppose maybe some kind of race condition blocks the released version of kurento. In the asan version i think the timing is a little bit different and this may be the cause why kurento runs longer.

INFO about Kurento Media Server

INFO about your Application Server

INFO about end-user clients

INFO about your environment

No STUN or TURN server are configured. The system is running in the local network with dhcp.

I have used the following docker run command:

docker run -d --restart always --name $KMS_CONT -e GST_DEBUG="3,Kurento*:4,kmselement:5,playerendpoint:5,appsrc:4,agnosticbin*:5,uridecodebin:6,rtspsrc:5,souphttpsrc:5,*CAPS*:3" -v $VOLUME_HOST:$VOLUME_KMS -v $VOLUME_FB_HOST:$VOLUME_FB_KMS -p $PORT_HOST:$PORT_KMS $KURENTO_IMG

Commands output

cat /etc/lsb-release

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"

kurento-media-server --version

0:00:01.036966573  4236 0x565505b80920 INFO    KurentoModuleManager ModuleManager.cpp:90:loadModule: Load file: /usr/lib/x86_64-linux-gnu/kurento/modules/libkmsfiltersmodule.so, module name: /usr/lib/x86_64-linux-gnu/kurento/modules/libkmsfiltersmodule.so
0:00:01.037154677  4236 0x565505b80920 INFO    KurentoModuleManager ModuleManager.cpp:137:loadModule: Loaded module: filters, version: 6.14.0, date: Jun 16 2020 16:56:52
0:00:01.076261978  4236 0x565505b80920 INFO    KurentoModuleManager ModuleManager.cpp:90:loadModule: Load file: /usr/lib/x86_64-linux-gnu/kurento/modules/libkmselementsmodule.so, module name: /usr/lib/x86_64-linux-gnu/kurento/modules/libkmselementsmodule.so
0:00:01.076365685  4236 0x565505b80920 INFO    KurentoModuleManager ModuleManager.cpp:137:loadModule: Loaded module: elements, version: 6.14.0, date: Jun 16 2020 16:47:08
0:00:01.091071000  4236 0x565505b80920 INFO    KurentoModuleManager ModuleManager.cpp:90:loadModule: Load file: /usr/lib/x86_64-linux-gnu/kurento/modules/libkmscoremodule.so, module name: /usr/lib/x86_64-linux-gnu/kurento/modules/libkmscoremodule.so
0:00:01.091181294  4236 0x565505b80920 INFO    KurentoModuleManager ModuleManager.cpp:137:loadModule: Loaded module: core, version: 6.14.0, date: Jun 16 2020 16:37:41
Kurento Media Server version: 6.14.0
Found modules:
        'core' version 6.14.0
        'elements' version 6.14.0
        'filters' version 6.14.0

dpkg -l | grep -Pi 'kurento|kms-|gst.*1.5|nice'

ii  gstreamer1.5-alsa:amd64              1.8.1-1kurento2.16.04                    amd64                                                              GStreamer plugin for ALSA
ii  gstreamer1.5-libav:amd64             1.8.1-1kurento1.16.04                    amd64                                                              libav plugin for GStreamer
ii  gstreamer1.5-libav-dbg:amd64         1.8.1-1kurento1.16.04                    amd64                                                              libav plugin for GStreamer (debug symbols)
ii  gstreamer1.5-nice:amd64              0.1.17-0kurento1.16.04                   amd64                                                              ICE library (GStreamer 1.5 plugin)
ii  gstreamer1.5-nice-dbgsym:amd64       0.1.17-0kurento1.16.04                   amd64                                                              debug symbols for package gstreamer1.5-nice
ii  gstreamer1.5-plugins-bad:amd64       1.8.1-1kurento4.16.04                    amd64                                                              GStreamer plugins from the "bad" set
ii  gstreamer1.5-plugins-bad-dbg:amd64   1.8.1-1kurento4.16.04                    amd64                                                              GStreamer plugins from the "bad" set (debug symbols)
ii  gstreamer1.5-plugins-base:amd64      1.8.1-1kurento2.16.04                    amd64                                                              GStreamer plugins from the "base" set
ii  gstreamer1.5-plugins-base-dbg:amd64  1.8.1-1kurento2.16.04                    amd64                                                              GStreamer plugins from the "base" set
ii  gstreamer1.5-plugins-good:amd64      1.8.1-1kurento4.16.04                    amd64                                                              GStreamer plugins from the "good" set
ii  gstreamer1.5-plugins-good-dbg:amd64  1.8.1-1kurento4.16.04                    amd64                                                              GStreamer plugins from the "good" set
ii  gstreamer1.5-plugins-ugly:amd64      1.8.1-1kurento1.16.04                    amd64                                                              GStreamer plugins from the "ugly" set
ii  gstreamer1.5-plugins-ugly-dbg:amd64  1.8.1-1kurento1.16.04                    amd64                                                              GStreamer plugins from the "ugly" set (debug symbols)
ii  gstreamer1.5-pulseaudio:amd64        1.8.1-1kurento4.16.04                    amd64                                                              GStreamer plugin for PulseAudio
ii  gstreamer1.5-x:amd64                 1.8.1-1kurento2.16.04                    amd64                                                              GStreamer plugins for X11 and Pango
ii  kms-core                             6.14.0-0kurento1.16.04                   amd64                                                              Kurento Core module
ii  kms-core-dbg                         6.14.0-0kurento1.16.04                   amd64                                                              Kurento Core module
ii  kms-elements                         6.14.0-0kurento1.16.04                   amd64                                                              Kurento Elements module
ii  kms-elements-dbg                     6.14.0-0kurento1.16.04                   amd64                                                              Kurento Elements module
ii  kms-filters                          6.14.0-0kurento1.16.04                   amd64                                                              Kurento Filters module
ii  kms-filters-dbg                      6.14.0-0kurento1.16.04                   amd64                                                              Kurento Filters module
ii  kms-jsonrpc                          6.14.0-0kurento1.16.04                   amd64                                                              Kurento JSON-RPC library
ii  kms-jsonrpc-dbg                      6.14.0-0kurento1.16.04                   amd64                                                              Kurento JSON-RPC library
ii  kmsjsoncpp                           1.6.3-1kurento1.16.04                    amd64                                                              Kurento jsoncpp library
ii  kmsjsoncpp-dbg                       1.6.3-1kurento1.16.04                    amd64                                                              Kurento jsoncpp library
ii  kurento-dbg                          6.14.0-0kurento1.16.04                   amd64                                                              Meta-package that installs debug symbols
ii  kurento-media-server                 6.14.0-0kurento1.16.04                   amd64                                                              Kurento Media Server
ii  kurento-media-server-dbg             6.14.0-0kurento1.16.04                   amd64                                                              Kurento Media Server
ii  libgstreamer-plugins-bad1.5-0:amd64  1.8.1-1kurento4.16.04                    amd64                                                              GStreamer development files for libraries from the "bad" set
ii  libgstreamer-plugins-base1.5-0:amd64 1.8.1-1kurento2.16.04                    amd64                                                              GStreamer libraries from the "base" set
ii  libgstreamer1.5-0:amd64              1.8.1-1kurento2.16.04                    amd64                                                              Core GStreamer libraries and elements
ii  libgstreamer1.5-0-dbg:amd64          1.8.1-1kurento2.16.04                    amd64                                                              Core GStreamer libraries and elements
ii  libnice10:amd64                      0.1.17-0kurento1.16.04                   amd64                                                              ICE library (shared library)
ii  libnice10-dbgsym:amd64               0.1.17-0kurento1.16.04                   amd64                                                              debug symbols for package libnice10
ii  libsrtp0:amd64                       1.6.0-0kurento1.16.04                    amd64                                                              Secure RTP (SRTP) and UST Reference Implementations - shared library
ii  libusrsctp                           0.9.2-1kurento1.16.04                    amd64                                                              sctp-refimpl library
ii  openh264                             1.4.0-1kurento1.16.04                    amd64                                                              OpenH264 library
ii  openh264-gst-plugins-bad-1.5:amd64   1.8.1-1kurento4.16.04                    amd64                                                              GStreamer plugins from openh264
ii  openwebrtc-gst-plugins               0.10.0-1kurento1.16.04                   amd64                                                              OpenWebRTC specific GStreamer plugins
ii  openwebrtc-gst-plugins-dbg           0.10.0-1kurento1.16.04                   amd64                                                              OpenWebRTC specific GStreamer plugins
j1elo commented 4 years ago

For 6.14.0 we worked on a performance issue that would cause CPU exhaustion when an Endpoint tried to disconnect from N Endpoints, for somewhat big N (depends on the machine, but somewhere around N > 20 we could observe the issues in our case).

With some optimizations the CPU usage decreased significantly, we talked a bit about this in the 6.14.0 blog post.

Your issue seems very similar to the one we solved. Maybe there are still some cases when CPU exhaustion occurs... Can you observe if the 200% CPU spikes are happening when one sender PlayerEndpoint is being disconnected from a big number of receiver WebRtcEndpoints?

Other idea is to avoid delete-and-create of new PlayerEndpoints each time the test video is repeated. What happens if, instead, you call setPosition(0) before the file ends playing? (e.g. if the video file length is 30 seconds, call player.setPosition(0) every 25 seconds).

PS. the -asan images are made to run Kurento with AddressSanitizer, to find memory corruption bugs. This doesn't look like a memory corruption issue, so I don't think we'll get anything useful from it. A better tool to use would be Callgrind, to generate profiling output and then inspect it with KCacheGrind, but we'll leave that for later if we don't find a solution.

MaxIasat commented 4 years ago

Hi Juan,

thank you for your reply.

I'll try to explain my application in more precise way.

  1. Number of clients: I observed this behaviour also with N=1, only one client.
  2. The high cpu spikes i could observe only by switching to new file. In this case the PlayerEndpoint must be released and new one must be created and connected to the existing WebRtcEndpoints. I can't say, if the cpu spikes occurs on disconnect or on connect.
  3. Regarding the using of only one PlayerEndpoint, so this is not an option. I use a list of files. On the productive system we have a dynamic play list. So, with the current kurento design interface i have to release the old one and to create the new PlayerEndpoint with the new file name.
  4. Additional observation: I've tried out to create the PlayerEndpoint with "file://" as well as "http://" prefix. Both are working fine. For http i used the nodejs express http server. In case of "http://" i've observed that the bug appears much faster. Also, i noticed in my log files that each file was accessing at least 3 times. I would expect only one time. In case of system struggeling i saw that the log had less than 3 times, maybe 2 or only 1 time. Maybe this information can help to catch this issue.

Thank you and best regards Maxim

MaxIasat commented 3 years ago

Hi All,

i would like to give an update of my observations:

  1. I have updated my system to KMS 7.0 with the newer GStreamer version. Installed method is docker.
  2. With the new KMS version i don't see the above described behavior!!! I suppose that the bug is related to the old GStreamer version.
  3. BUT: If i call PlayerEndpoint with the "http.//" parameter i have KMS crashes as well as Browser crashes. The system stays stable if i use the "file://" parameter. I suppose that the "http://" method doesn't work properly.

Conclusion:

j1elo commented 3 years ago

So this seems to come from GStreamer, which makes it a bit harder to debug. However I'll add this issue to the list of GStreamer 1.14 related topics, so we know that this needs to be looked at when working on Kurento 7.0.0 with GStreamer 1.14.

The HTTP plugin should work fine now that we're going to release the version 6.15.0, so I'd ecommend you to update when it gets published in the next weeks.

MaxIasat commented 3 years ago

Hi All,

i have installed the 6.15.0 on our system and let it run for many weeks. Unfortunately, this version crashed after ca. 6 days! I noticed the crashes many times (segmentation error).

I noticed also memory consumption by kms. I suppose that there is a memory leak somewhere, because in all cases no free memory was available. So, unfortunately the 6.15.0 is not usable for me.

In general, the situation with this version is much better. The previous version 6.14.0 was crashed after some couple of hours. So i think, the fixes are working, but somewhere memory is leaking and it causes the crash of the system.

The kms 7.0.0 experimental version is working more stable. It was running more than 18 days without any problems. So, my question is also, when is the final release of the kms 7.0.0 with the new gstreamer? I suppose, that many issues are coming from the old gstreamer branch.

Thank you and best regards

micaelgallego commented 3 years ago

I'm sorry but there is not ETA for 7.0.0 version.

We are an small team working on Kurento and OpenVidu and right now we are focused on providing scalability and fault tolerance to OpenVidu.

Regards

j1elo commented 3 years ago

@MaxIasat Although I said that this doesn't look like a memory corruption bug, just in case, could you confirm if there were any further logs after the -asan image crashed?

Long term behaviour kurento 6.14.0-asan So, for further analysing (maybe memory leaks) I used the asan version of the kurento as docker with the same configuration described above. I was very surprising, but this asan version were running more than 3 days on my system. But the kurento media server crashed! Here are the last log lines:

{"log":"63:45:14.991258108     9 0x61d0090e0630 WARN          dtlsconnection gstdtlsconnection.c:312:handle_timeout:\u003cGstDtlsConnection@0x621006c38460\u003e handling timeout failed\n","stream":"stderr","time":"2020-07-30T22:24:26.902977789Z"}
{"log":"63:45:25.128369440     9 0x61d0090e04f0 WARN          dtlsconnection gstdtlsconnection.c:312:handle_timeout:\u003cGstDtlsConnection@0x621006a43560\u003e handling timeout failed\n","stream":"stderr","time":"2020-07-30T22:24:26.902996642Z"}
{"log":"63:45:25.128247109     9 0x61d0090e0540 WARN          dtlsconnection gstdtlsconnection.c:312:handle_timeout:\u003cGstDtlsConnection@0x621006d06ae0\u003e handling timeout failed\n","stream":"stderr","time":"2020-07-30T22:24:26.903015638Z"}
{"log":"63:45:40.736984796     9 0x61d008b848a0 WARN          dtlsconnection gstdtlsconnection.c:312:handle_timeout:\u003cGstDtlsConnection@0x621006c381e0\u003e handling timeout failed\n","stream":"stderr","time":"2020-07-30T22:24:26.903034851Z"}
{"log":"63:45:40.736981220     9 0x61d0090e0400 WARN          dtlsconnection gstdtlsconnection.c:312:handle_timeout:\u003cGstDtlsConnection@0x621006b70560\u003e handling timeout failed\n","stream":"stderr","time":"2020-07-30T22:24:26.903053751Z"}
{"log":"63:45:45.413900652     9 0x619009775890 WARN       KurentoWorkerPool WorkerPool.cpp:122:checkThreads: Worker thread pool is lagging! (CPU exhausted?)\n","stream":"stderr","time":"2020-07-30T22:24:26.903083716Z"}
{"log":"63:45:47.777772174     9 0x61d008f360f0 WARN          dtlsconnection gstdtlsconnection.c:312:handle_timeout:\u003cGstDtlsConnection@0x6210075a2470\u003e handling timeout failed\n","stream":"stderr","time":"2020-07-30T22:24:26.903102773Z"}
{"log":"63:45:47.777753526     9 0x61d0090e04f0 WARN          dtlsconnection gstdtlsconnection.c:312:handle_timeout:\u003cGstDtlsConnection@0x621006ed67a0\u003e handling timeout failed\n","stream":"stderr","time":"2020-07-30T22:24:26.90313725Z"}
{"log":"/entrypoint.sh: line 84:     9 Killed                  LD_PRELOAD=\"$PWD/libasan.so\" ASAN_OPTIONS='detect_odr_violation=0 detect_leaks=1 detect_invalid_pointer_pairs=2 strict_string_checks=1 detect_stack_use_after_return=0 check_initialization_order=1 strict_init_order=1' LD_LIBRARY_PATH=\"$PWD\" ./kurento-media-server --modules-path=\"$PWD\" --gst-plugin-path=\"$PWD\"\n","stream":"stderr","time":"2020-07-30T22:24:27.647452597Z"}

This behaviour I could reproduce twice.

Normally when the asan image crashes, it should print some final logs with all the memory information and details about where the crash happened. I could review those, just in case there is something to be found.