AirenSoft / OvenMediaEngine

OvenMediaEngine (OME) is a Sub-Second Latency Live Streaming Server with Large-Scale and High-Definition. #WebRTC #LLHLS
https://airensoft.com/ome.html
GNU Affero General Public License v3.0
2.58k stars 1.06k forks source link

Edge crushes on client disconnect #981

Closed oleglab closed 1 year ago

oleglab commented 1 year ago

Describe the bug Edge crashes on new client connection:

Logs

edge_1 | [2022-12-22 06:26:42.237] I [SPICE-T3479:29] ICE | ice_port.cpp:479 | Turn client has connected : <ClientSocket: 0x7f57bf290410, #139, Connected, TCP, Nonblocking, ...:57388> edge_1 | [2022-12-22 06:26:42.238] I [SPICE-T3479:29] ICE | ice_port.cpp:479 | Turn client has connected : <ClientSocket: 0x7f57e4cf8c10, #140, Connected, TCP, Nonblocking, ...:57389> edge_1 | OvenMediaEngine: ../nptl/pthread_mutex_lock.c:81: pthread_mutex_lock: Assertion `mutex->data.__owner == 0' failed. edge_1 | [2022-12-22 06:26:42.307] C [SPICE-T3479:29] OvenMediaEngine | signals.cpp:114 | OME received signal 6 (SIGABRT), interrupt. edge_1 | pure virtual method called edge_1 | terminate called without an active exception

Server (please complete the following information):

Additional context crash_20221222.dump.zip

getroot commented 1 year ago

Thanks for reporting the crash bug. I will be looking into this soon.

getroot commented 1 year ago

This seems like a difficult problem to reproduce. How often does this problem reproduce in your environment? If you upload the entire ovenmediaengine.log file, it would be more helpful for analysis.

And I patched one suspected thing. Could you please check if the issue still reproduces with the airensoft/ovenmediaenigne:dev image?

oleglab commented 1 year ago

@getroot Thank you very much for the quick reply! It happens quite often ~ sometimes few times per hour. I think it's easier to reproduce when I add more workers for signaling, ice and turn. I'll be happy to test the patch and report the result today. Also I do have the log, but it has a lot of sensitive information, so please advice if it's possible to upload it privately.

getroot commented 1 year ago

Then could you send your Server.xml and log files to support@airensoft.com? Thank you for your contribution.

oleglab commented 1 year ago

I think the dev build looks better, and I couldn't reproduce the crash so far. However SPRtcSig stops accepting new connections at some point. We saw the same problem with airensoft/ovenmediaengine:0.14.17.

I've just sent 2 dumps with logs and configuration to the support.

Thank you very much!

getroot commented 1 year ago

Thank you. I received your mail. I'll use your settings and logs to analyze the problem!

getroot commented 1 year ago

I haven't been able to reproduce this problem, but I suspect the crash is caused by misuse of the redis client library. I fixed the code related to it. Can you test with the master version? Or, if my suspicions are correct, this issue shouldn't be reproduced even in the old 0.14.17 version if you change the WebRTC Signalling WorkerCount from 4 to 1. I would be very grateful if you could test it out whichever way you prefer and let me know the results.

oleglab commented 1 year ago

Thanks @getroot! I am going to test it today!

oleglab commented 1 year ago

We setup health checks for signaling port and the edges look much more stable now. Just one restart during last 27 hours.

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1c16a819ea9e ovenmedia_edge "/opt/ovenmediaengin…" 27 hours ago Up 25 hours (healthy) ovenmedia_edge_1

oleglab commented 1 year ago

Closing as completed! Thanks!