AirenSoft / OvenMediaEngine

OvenMediaEngine (OME) is a Sub-Second Latency Live Streaming Server with Large-Scale and High-Definition. #WebRTC #LLHLS
https://airensoft.com/ome.html
GNU Affero General Public License v3.0
2.57k stars 1.06k forks source link

OME stops accepting new streams #1234

Closed naanlizard closed 1 year ago

naanlizard commented 1 year ago

At a little bit after 05:00:00 in the logs here https://gist.githubusercontent.com/naanlizard/6cfa3ceca4ffdd7721ce8e30bacc85b6/raw/1dfb37711ac001dcdf0fdc74ba02317601722d24/may15clean.log OME stops accepting new inputs (rtmp directly to the server), and stops serving any streams (presumably because they are disconnected).

image

image

I checked and there are no dump files in OME's docker

If this isn't enough info to find the bug, please tell me what you need for the next time this happens, and I will get it.

Originally posted by @naanlizard in https://github.com/AirenSoft/OvenMediaEngine/issues/1166#issuecomment-1547773782

naanlizard commented 1 year ago

I've contacted the streamer you mentioned to see if I can get a copy of their encoder settings, I will update if/when they respond.

getroot commented 1 year ago

I've contacted the streamer you mentioned to see if I can get a copy of their encoder settings, I will update if/when they respond.

That will be of great help. thank you

getroot commented 1 year ago

Upon checking the logs, it appears that the live/73966 stream is the cause, as it is attempting to create a segment that is a whopping 28,775 seconds long. Typically, this indicates that no keyframe has been received during that time. It could be intentional behavior from the encoder, or it might be a bug in the encoder itself. Or there could be a bug in the logic that determines segment creation in the OME.

Since I didn't anticipate this situation, OME doesn't have any code to handle it.

I'll check under exactly what conditions this problem reproduces and fix it.

[2023-05-15 05:04:22.192] I [SPRTMP-t1951:30] RTMPProvider | rtmp_provider.cpp:216  | The RTMP client has disconnected: [#default#live/73966], remote: <ClientSocket: 0x7f94a6399810, #239, Disconnected, TCP, Nonblocking, x.x.x.x:5274>
[2023-05-15 05:04:22.193] W [AW-LLHLS6:85] FMP4 Packager | fmp4_storage.cpp:307  | LLHLS stream(#default#live/73966_audio) is creating segments that are too large(28775789.000000 ms). It seems that the keyframe interval is longer than the configured segment size.
[2023-05-15 05:04:22.193] I [SPRTMP-t1951:30] Provider | stream.cpp:63   | #default#live/73966(1602) has been stopped playing stream
[2023-05-15 05:04:22.193] I [SPRTMP-t1951:30] MediaRouter | mediarouter_application.cpp:472  | [#default#live/73966(1602)] Trying to delete a stream
[2023-05-15 05:04:22.193] I [SPRTMP-t1951:30] Monitor | application_metrics.cpp:69   | Delete StreamMetrics(73966/06fba6ba-54a0-468e-87ee-ac48c94b0c09/default/#default#live/73966/i) for monitoring
[2023-05-15 05:04:22.193] W [AW-LLHLS10:89] FMP4 Packager | fmp4_storage.cpp:307  | LLHLS stream(#default#live/73966) is creating segments that are too large(28775789.000000 ms). It seems that the keyframe interval is longer than the configured segment size.
[2023-05-15 05:04:22.193] I [SPRTMP-t1951:30] Monitor | stream_metrics.cpp:31   | 
[Stream Info]
id(1602), msid(0), output(73966), SourceType(Rtmp), RepresentationType(Source), Created Time (Sun May 14 21:03:50 2023) UUID(06fba6ba-54a0-468e-87ee-ac48c94b0c09/default/#default#live/73966/i)

    Video Track #0: Public Name(Video_0) Variant Name(Video) Bitrate(4.00Mb) Codec(1,H264,Auto) BSF(H264_AVCC) Resolution(1280x720) Framerate(30.00fps) KeyInterval(61) BFrames(0) timebase(1/1000)
    Audio Track #1: Public Name(Audio_1) Variant Name(Audio) Bitrate(128.00Kb) Codec(6,AAC,Auto) BSF(AAC_RAW) Samplerate(44.1K) Format(fltp, 32) Channel(mono, 1) timebase(1/1000)
    Data  Track #2: Public Name(Data_2) Variant Name(Data) Codec(0,Unknown,Auto) BSF(ID3v2) timebase(1/1000)

    >> Statistics
    Last update time : Mon May 15 05:04:22 2023, Last sent time : Sun May 14 22:35:04 2023
    Bytes in : 9.49MB, Bytes out : 5.42MB, Concurrent connections : 0, Max connections : 1 (Sun May 14 21:03:57 2023)

        >>>> By publisher
        - Unknown : Bytes out(0B) Concurrent Connections (0)
        - WebRTC : Bytes out(0B) Concurrent Connections (0)
        - MPEGTSPush : Bytes out(0B) Concurrent Connections (0)
        - RTMPPush : Bytes out(0B) Concurrent Connections (0)
        - HLS : Bytes out(0B) Concurrent Connections (0)
        - DASH : Bytes out(0B) Concurrent Connections (0)
        - LLDASH : Bytes out(0B) Concurrent Connections (0)
        - LLHLS : Bytes out(5.42MB) Concurrent Connections (0)
        - OVT : Bytes out(0B) Concurrent Connections (0)
        - File : Bytes out(0B) Concurrent Connections (0)
        - Thumbnail : Bytes out(0B) Concurrent Connections (0)

[2023-05-15 05:04:22.194] I [AW-LLHLS10:89] FMP4 Packager | fmp4_storage.cpp:173  | Try to create directory for LLHLS DVR: /tmp/ome_dvr//#default#live/73966/1
[2023-05-15 05:04:22.194] I [AW-LLHLS6:85] FMP4 Packager | fmp4_storage.cpp:173  | Try to create directory for LLHLS DVR: /tmp/ome_dvr//#default#live/73966_audio/4
[2023-05-15 05:04:22.194] W [SPRTMP-t1951:30] FFmpeg | third_parties.cpp:115  | [AVCodecContext: 0x7f94ef43b400] 1 frames left in the queue on closing
[2023-05-15 05:04:22.194] W [Decaac:230386] Transcoder | transcoder_filter.cpp:104  | Timestamp has changed abnormally.  57097 -> 28829447
[2023-05-15 05:04:22.194] W [AW-LLHLS10:89] FMP4 Packager | fmp4_storage.cpp:307  | LLHLS stream(#default#live/73966) is creating segments that are too large(967.000000 ms). It seems that the keyframe interval is longer than the configured segment size.
[2023-05-15 05:04:22.194] W [AW-LLHLS6:85] FMP4 Packager | fmp4_storage.cpp:307  | LLHLS stream(#default#live/73966_audio) is creating segments that are too large(967.000000 ms). It seems that the keyframe interval is longer than the configured segment size.
naanlizard commented 1 year ago

image

Here is a screenshot of the stream in question, which goes online and then offline far before the issue occurs.

I believe the screenshots are 4 hours ahead of the log file e.g. in the log file, 12:00 is 16:00 in the screenshot(s)

naanlizard commented 1 year ago

Sorry, I was off in the wrong direction for times - in the log, timestamps are 4 hours ahead. So 12:00 in the log file is 8:00 in the screenshots. Or 5:00 in the log file is 1:00 in the screenshots (toward the right)

naanlizard commented 1 year ago

They're using Streamlabs on an iPad, photo_2023-05-15_17-17-18 photo_2023-05-15_17-13-41

I'm assuming they haven't changed many of the defaults.

getroot commented 1 year ago

Thanks for the detailed information. My guess so far is that that encoder hasn't sent any packets since the connection was open at some point. (Perhaps the app was in the background) And that encoder sent packets again around 05:00 and at the same time terminated the connection. At this time, OME was attempting to create a segment that was approximately 8 hours long, while simultaneously processing the deletion of the stream. OME seems to be malfunctioning under these conditions. I'll get the app and try to reproduce the problem.

naanlizard commented 1 year ago

I've asked the user and we'll see - I suspect they weren't doing anything around the time of the crash, but perhaps.

naanlizard commented 1 year ago

This morning it happened again, I saved 16 hours of log files this time which is probably excessive but perhaps will help. It should include both the problem stream starting and then the server crashing.

https://mega.nz/file/VFUCjZ6I#I4OeNO5Eu-QtNtJCA074rvzq7cAWZloEgHbHuKT5UrU gzipped for your sanity, it's about 2 gigs as a text file

image

naanlizard commented 1 year ago

Just realized an important detail

Ome might not see the stream, when testing it was only forwarded from our Nginx rtmp instance to ome for a short while, then disconnected for some reason. Ome continues to let you request the llhls playlist but it loops and is broken. I'm not sure if webrtc worked

Keukhan commented 1 year ago

@naanlizard

Thank you for reporting Analyzing your log, I found that a race condition occurred during the deleting the stream and recreating the audio resampling filter. The problem has been fixed as below, so please test it on the master branch.

https://github.com/AirenSoft/OvenMediaEngine/commit/e7d79c4aad28dbb2cc832e242844dec641ea6439

Thanks

naanlizard commented 1 year ago

Thank you, we have updated ome and will reopen or make a new ticket if we have any more problems