Glimesh / janus-ftl-plugin

A plugin for the Janus WebRTC gateway to enable relaying of audio/video streams utilizing Mixer's FTL (Faster-Than-Light) protocol.
https://hayden.fyi/posts/2020-08-03-Faster-Than-Light-protocol-engineering-notes.html
GNU Affero General Public License v3.0
44 stars 11 forks source link

Ingest hangs if bitrate limit exceeded #84

Closed clone1018 closed 3 years ago

clone1018 commented 3 years ago

I've only seen this happen once, but I wanted to get it on the radar:

# Channel going over it's limit
Feb 20 11:06:18 janus[963731]: [2021-02-20 11:06:18.492] [info] Channel 3 / Stream 356 is averaging 6669620bps, exceeding the limit of 6500000bps. Stopping the stream...
Feb 20 11:06:18 janus[963731]: [2021-02-20 11:06:18.492] [info] Stopping FTL channel 3 / stream 356...
Feb 20 14:45:20 janus[963731]: [rtp-sample] New video stream! (ssrc=825307441, index 0)

# Usual non-video client trash
Feb 20 17:13:05 janus[963731]: [ERR] [transports/janus_http.c:janus_http_handler:1431] Invalid url /
Feb 20 19:09:16 janus[963731]: [ERR] [transports/janus_http.c:janus_http_handler:1431] Invalid url /ws/v1/cluster/apps/new-application
Feb 21 02:31:56 janus[963731]: [ERR] [transports/janus_http.c:janus_http_handler:1431] Invalid url /
Feb 21 02:58:15 janus[963731]: [ERR] [transports/janus_http.c:janus_http_handler:1431] Invalid url /
Feb 21 10:45:24 janus[963731]: [ERR] [transports/janus_http.c:janus_http_handler:1431] Invalid url /

# Tried connecting with OBS a couple of times and when those failed, restarted Janus
Feb 21 13:15:37 systemd[1]: Stopping Janus WebRTC Server...

Did not take a core dump, will do next time!

clone1018 commented 3 years ago

Confirmed, happened again.

Coredump is at

/root/core.985733
haydenmc commented 3 years ago

Was able to find the root cause.

JanusFtl::serviceReportThreadBody(...) holds a lock on JanusFtl::streamDataMutex while evaluating if clients have exceeded the maximum bitrate setting.

A deadlock can occur if a client exceeding the bitrate cap is stopped while it is in the midst of relaying an RTP packet. Here's the outline:

  1. JanusFtl::serviceThreadReportBody holds lock on JanusFtl::streamDataMutex, calls FtlServer::StopStream(...)
  2. FtlServer::StopStream(...) -> FtlStream::Stop() -> FtlStream::mediaTransport::Stop()
  3. FtlStream::mediaTransport::Stop() closes the socket and waits for the connection thread to end.
  4. The connection thread is busy relaying an RTP packet in JanusFtl::ftlServerRtpPacket(...), and is attempting to acquire a shared lock on JanusFtl::streamDataMutex, but the lock is already being held by the JanusFtl::serviceThreadReportBody thread.