Glimesh / janus-ftl-plugin

A plugin for the Janus WebRTC gateway to enable relaying of audio/video streams utilizing Mixer's FTL (Faster-Than-Light) protocol.
https://hayden.fyi/posts/2020-08-03-Faster-Than-Light-protocol-engineering-notes.html
GNU Affero General Public License v3.0
44 stars 11 forks source link

Auth timeout streaming to a specific channel #124

Open clone1018 opened 3 years ago

clone1018 commented 3 years ago

Hi there, looks like we're experiencing an issue when trying to stream to the /Glimesh channel. We experienced the problem today when mitori tried to start the Glimesh University event, he attempted to start the stream in OBS and after a couple of seconds got an error message. I checked the logs and it looks like the auth sequence is timing out.

The last time we streamed to the channel successfully was last Monday. I also restarted Janus on the ingest to try and clear out any bad state. The issue persisted.

I can see the channel's LIVE tag pop-up after OBS attempts to connect, however it leaves the Glimesh.tv DB in a bad state because it does not send a endStream notice.

There is a potential this was introduced on the Glimesh.tv side, but we have not heard of any specific issues with other users yet.

[...] mitori trying to stream to /Glimesh [...]

[2021-04-19 21:12:01.908] [info] New FTL control connection is pending from <MITORI_IP>
[2021-04-19 21:12:02.146] [info] <MITORI_IP> authenticated as Channel 2752 successfully.
[2021-04-19 21:12:02.165] [warning] Received unrecognized attribute from client: ProtocolVersion: 0.9
[2021-04-19 21:12:06.921] [info] <MITORI_IP> didn't authenticate within 5000ms, closing
[2021-04-19 21:12:07.232] [warning] Glimesh service connection HTTP request failed with error 4
[2021-04-19 21:12:07.232] [warning] Attempt 1 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...
[2021-04-19 21:12:15.288] [warning] Glimesh service connection HTTP request failed with error 4
[2021-04-19 21:12:15.288] [warning] Attempt 2 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...
[2021-04-19 21:12:23.348] [warning] Glimesh service connection HTTP request failed with error 4
[2021-04-19 21:12:23.348] [warning] Attempt 3 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...
[2021-04-19 21:12:31.400] [warning] Glimesh service connection HTTP request failed with error 4
[2021-04-19 21:12:31.400] [warning] Attempt 4 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...

[...] clone1018 trying to reproduce the issue to /Glimesh [...]

[2021-04-19 21:12:33.056] [info] New FTL control connection is pending from <CLONE_IP>
[2021-04-19 21:12:33.381] [info] <CLONE_IP> authenticated as Channel 2752 successfully.
[2021-04-19 21:12:33.481] [warning] Received unrecognized attribute from client: ProtocolVersion: 0.9
[2021-04-19 21:12:38.086] [info] <CLONE_IP> didn't authenticate within 5000ms, closing
[2021-04-19 21:12:39.466] [warning] Glimesh service connection HTTP request failed with error 4
[2021-04-19 21:12:39.466] [warning] Attempt 5 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...
[2021-04-19 21:12:47.389] [info] Publishing channel 2752 / stream 78973 to Orchestrator...
[2021-04-19 21:12:47.389] [info] Registered new stream: Channel 2752 / Stream 78973.
[2021-04-19 21:12:47.389] [info] Channel 2752 / Stream 78973 control connection was removed before a media port could be assigned.
[2021-04-19 21:12:52.343] [info] Existing Stream 78973 exists for Channel 2752 - stopping...
[2021-04-19 21:12:52.343] [info] Unpublishing channel 2752 / stream 78973 from Orchestrator
[2021-04-19 21:12:52.343] [info] Stream ended. Channel 2752 / stream 78973
[2021-04-19 21:12:52.343] [warning] FtlServer::eventStopStream couldn't find stream to remove.
[2021-04-19 21:12:52.565] [info] Publishing channel 2752 / stream 78974 to Orchestrator...
[2021-04-19 21:12:52.565] [info] Registered new stream: Channel 2752 / Stream 78974.
[2021-04-19 21:12:52.565] [error] Stream ended from channel 2752 had unexpected stream id 78973, expected 78974
[2021-04-19 21:12:52.565] [info] Channel 2752 / Stream 78974 control connection was removed before a media port could be assigned.
[2021-04-19 21:12:52.565] [info] Unpublishing channel 2752 / stream 78974 from Orchestrator
[2021-04-19 21:12:52.565] [info] Stream ended. Channel 2752 / stream 78974
[2021-04-19 21:12:52.778] [info] Start Stream Relay request from Orchestrator: Channel 2752, Stream 78973, Target do-nyc3-edge6.kjfk.live.glimesh.tv
[2021-04-19 21:12:52.778] [error] Orchestrator requested a relay for channel that is not streaming.Target hostname: do-nyc3-edge6.kjfk.live.glimesh.tv, Channel ID: 2752

[...] clone1018 trying again [...]

[2021-04-19 21:13:32.675] [info] New FTL control connection is pending from <CLONE_IP>
[2021-04-19 21:13:32.940] [info] <CLONE_IP> authenticated as Channel 2752 successfully.
[2021-04-19 21:13:33.044] [warning] Received unrecognized attribute from client: ProtocolVersion: 0.9
[2021-04-19 21:13:37.693] [info] <CLONE_IP> didn't authenticate within 5000ms, closing
[2021-04-19 21:13:38.213] [info] Publishing channel 2752 / stream 78977 to Orchestrator...
[2021-04-19 21:13:38.214] [info] Registered new stream: Channel 2752 / Stream 78977.
[2021-04-19 21:13:38.214] [info] Channel 2752 / Stream 78977 control connection was removed before a media port could be assigned.
[2021-04-19 21:13:38.214] [info] Unpublishing channel 2752 / stream 78977 from Orchestrator
[2021-04-19 21:13:38.214] [info] Stream ended. Channel 2752 / stream 78977
[2021-04-19 21:13:38.436] [info] Start Stream Relay request from Orchestrator: Channel 2752, Stream 78977, Target do-nyc3-edge6.kjfk.live.glimesh.tv
[2021-04-19 21:13:38.436] [error] Orchestrator requested a relay for channel that is not streaming.Target hostname: do-nyc3-edge6.kjfk.live.glimesh.tv, Channel ID: 2752
haydenmc commented 3 years ago

Per offline discussion, the current theory is that the startStream query is taking longer than 5000ms to return, causing the connection to time out before the FTL handshake is completed.