Open darchuk opened 6 years ago
I have the same problem. very critical!!
Although I don't know if it's related to this, garbage collector often run just before stalling KMS.
2018-07-12 15:00:06,045955 24956 [0x00007fed74fd8700] debug KurentoMediaSet MediaSet.cpp:122 doGarbageCollection() Running garbage collector
Does gc something evil?
Same problem over here! We need to get this fixed!
We see this issue as well.
We are having the same issue with a client in production. After a couple of hours of streaming Kurento doesn't accept connections anymore and we need to restart the app. We think it might be because of insufficient CPU/RAM as after it crashes almost 100% of the memory is used.
What kind of connections KMS is not attending? WebSocket connections using KurentoClient?
What kind of connections KMS is not attending? WebSocket connections using KurentoClient? Correct. btw, I can't even open WebSocket connection with kurento directly, bypassing kurentoClient
In my case, Kurento seemed not to listen on the default WebSocket port at all.
Usually Application Server uses KurentoClient to connect to KMS and maintain WebSocket connection opened. Is this your case? Requests give you timeouts? Or you connect and disconnect KurentoClient from time to time?
We are preparing KMS load testing to try to solve these issues.
Yes, we use KurentoClient to maintain WebSocket connections, and keep KurentoClient connected as long as application server is running. At some time kurento is starting to send timeout exceptions (we use default 30s timeout)
There are two cases in our production.
reconnect to server 1549 10000 dd6db11f-ab91-4fd5-b0d2-3bb80504905f
reconnect to server 1549 10000 a9a72208-52df-4e32-ac1f-8b96fac683e8
reconnect to server 1549 10000 b1ef16c9-72f3-4db5-b6f8-9ba5d3482cdf
reconnect to server 1549 10000 75c19c81-bc60-457d-99e2-5c122a349986
note: On our environment, kurento ping message is sent to the server once per 1minute for monitoring.
https://doc-kurento.readthedocs.io/en/stable/features/kurento_protocol.html#ping
Because of the case2, I don't think that load is so important.
Any updates on this?
Please change this line to increment the number of threads from 1
to 10
:
static WorkerPool workers (10);
then build and test if this improves the issue (instructions here to build from sources)
You can now test this with an experimental installation of Kurento, using this apt-get repository in your /etc/apt/sources.list.d/kurento.list
:
deb http://ubuntu.openvidu.io/eventhandler-10-threads xenial kms6
Or with this Docker image https://hub.docker.com/r/kurento/kurento-media-server-exp
kurento/kurento-media-server-exp:eventhandler-10-threads
You can now test this with an experimental installation of Kurento, using this apt-get repository in your
/etc/apt/sources.list.d/kurento.list
:deb http://ubuntu.openvidu.io/eventhandler-10-threads xenial kms6
Or with this Docker image https://hub.docker.com/r/kurento/kurento-media-server-exp
kurento/kurento-media-server-exp:eventhandler-10-threads
Hi j1elo, thanks for answer. Why should we prefer experimental kurento over usual one, what if we just fork kurento module and change it by ourselves?
It's just a more convenient method for some users that don't want or don't know how to build from sources; I had to do the images for some internal tests anyway, so I just told you about them here.
Of course, if you can just do as I said in my previous comment https://github.com/Kurento/bugtracker/issues/259#issuecomment-523077731 then just do it and let me know if the change improves anything
Using KMS 6.10 with Ubuntu 18.04 might solve this issue.
We recently changed the versions (KMS 6.10 from 6.6, Ubuntu 18.04 from 16.04). And the issue has not been happened yet.
But I'm not sure. I'll watch the situation for a while.
Ok let me know if it happens... the eventhandler-10-threads
branch (https://github.com/Kurento/kms-core/commit/5e9cb3f46f731f90bd9ee3e14464497231d7b6c4) is still waiting for feedback to know if it's a real improvement or not, for this issue.
If you don't see the problem with KMS 6.10 and Ubuntu 18.04, then it means the change in eventhandler-10-threads
is irrelevant. But then we wouldn't know if it got solved by 6.10 or by Ubuntu 18.04 :-)
In any case, let us know of what you find out, thanks!
It's not happened yet after upgrading Ubuntu and KMS.
I have this problem too, any body resolved this?
KMS Version:
kurento-media-server -v Kurento Media Server version: 6.7.1 Found modules: 'core' version 6.7.1 'elements' version 6.7.1 'filters' version 6.7.1 'proctorplugin' version 0.0.1-dev
Other libraries versions: We are using https://github.com/Kurento/kms-opencv-plugin-sample and openCV 2.4.13
dpkg -l | egrep -i "kurento|gstreamer|nice"
Client libraries
Browsers tested
System description: Operating system where the client is running
lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.4 LTS Release: 16.04 Codename: xenial
How is the system deployed? Are KMS and client in the same network? Are you using TURN/STUN?
The same result when client and KMS are in the same network and in defferent ones. We are using TURN.
Config file in /etc/kurento/kurento.conf.json
{ "mediaServer" : { "resources": { // //Resources usage limit for raising an exception when an object creation is attempted // "exceptionLimit": "0.8", // // Resources usage limit for restarting the server when no objects are alive // "killLimit": "0.7", // Garbage collector period in seconds "garbageCollectorPeriod": 240 }, "net" : { "websocket": { "port": 8888, //"secure": { // "port": 8433, // "certificate": "defaultCertificate.pem", // "password": "" //}, //"registrar": { // "address": "ws://localhost:9090", // "localAddress": "localhost" //}, "path": "kurento", "threads": 10 } } } }
We have such issue on production, where users are using different version of chrome.Kurento is running and everything works fine and after some time later kurento is still running, but can't handle any connection.
Finally we've found the way how to reproduce it. We run auto-tests (1st user is streaming,2nd one connecting as a viewer, stream is running for 10 minutes, stop process (release pipeline etc.)) After some hours of running tests we face with described problem, kurento is not able to accept connections anymore. What steps will reproduce the problem?
Expected result: Kurento keeps accepting connections.
Actual result: Kurento process is alive, but ws connection can't be established.
Last logs kurento has written: 2018-06-15 05:06:23,491615 756 [0x00007fd3fbfff700] debug KurentoMediaSet MediaSet.cpp:122 doGarbageCollection() Running garbage collector 2018-06-15 05:10:23,491924 756 [0x00007fd3fbfff700] debug KurentoMediaSet MediaSet.cpp:122 doGarbageCollection() Running garbage collector 2018-06-15 05:10:23,492051 756 [0x00007fd3fbfff700] warning KurentoMediaSet MediaSet.cpp:130 doGarbageCollection() Session timeout: 3d7c941c-ec4b-43ea-875f-370fc73672b0 2018-06-15 05:10:23,492354 756 [0x00007fd403fff700] debug KurentoMediaElementImpl MediaElementImpl.cpp:1013 disconnect() Disconnecting 15d6b0e4-0678-4a0c-924a-89b90ded59be_kurento.MediaPipeline/7016d3f4-754c-48c1-86d8-dcbc28f8395a_kurento.WebRtcEndpoint - 15d6b0e4-0678-4a0c-924a-89b90ded59be_kurento.MediaPipeline/29b3bbcd-a974-4fd5-a402-f9de1033a32b_proctorplugin.ProctorPlugin params AUDIO default default 2018-06-15 05:10:26,492446 756 [0x00007fd408ea4700] warning KurentoWorkerPool WorkerPool.cpp:155 operator()() Worker threads locked. Spawning a new one.
After that any ws connection to kurento will be refused.
UP: netstat -anpl| grep kurento| wc -l 464
root@ip-10-0-1-73:/home/ubuntu# free -m total used free shared buff/cache available Mem: 15038 996 11592 40 2449 13623 Swap: 0 0
lsof| grep kurento | wc -l 530025 (see file in attach)
cat /proc/1681/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 60053 60053 processes Max open files 766673 766673 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 60053 60053 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us
pstree -p 1681| wc -l 572