Glimesh / janus-ftl-plugin

A plugin for the Janus WebRTC gateway to enable relaying of audio/video streams utilizing Mixer's FTL (Faster-Than-Light) protocol.
https://hayden.fyi/posts/2020-08-03-Faster-Than-Light-protocol-engineering-notes.html
GNU Affero General Public License v3.0
44 stars 11 forks source link

Starting stream for non-existent stream id crashes ingest service #91

Closed danstiner closed 3 years ago

danstiner commented 3 years ago

I setup a full ingest/edge/orchestrator/Glimesh.tv VM farm locally and found something interesting.

When I try to start streaming from OBS for a channel id > 1 my ingest service crashes, resulting in the following output. With channel id 1 (or any other channel that does exist), I do not get a crash.

This may be a ship blocker, if someone just mistypes their channel id and connects a few times it'll bring down all the ingest services.

OBS image

Janus / FTL

ubuntu@ftl-ingest-1:~$ ./run.sh
Janus commit: not-a-git-repo
Compiled on:  Fri Feb 26 23:47:18 PST 2021

Logger plugins folder: /opt/janus/lib/janus/loggers
[WARN]  Couldn't access logger plugins folder...
---------------------------------------------------
  Starting Meetecho Janus (WebRTC Server) v0.10.10
---------------------------------------------------

Checking command line arguments...
Debug/log level is 4
Debug/log timestamps are disabled
Debug/log colors are enabled
Adding 'vmnet' to the ICE ignore list...
Using 172.25.17.240 as local IP...
Token based authentication disabled
Initializing recorder code
Initializing ICE stuff (Full mode, ICE-TCP candidates disabled, half-trickle, IPv6 support disabled)
TURN REST API backend: (disabled)
[WARN] Janus is deployed on a private address (172.25.17.240) but you didn't specify any STUN server! Expect trouble if this is supposed to work over the internet and not just in a LAN...
Crypto: OpenSSL >= 1.1.0
No cert/key specified, autogenerating some...
Fingerprint of our certificate: 42:FB:D0:C5:79:55:AC:11:22:7D:8D:32:1C:C5:16:7D:CC:8D:33:64:6B:09:18:4B:63:A4:EE:A0:54:DC:EE:45
[WARN] Data Channels support not compiled
Event handlers support disabled
Plugins folder: /opt/janus/lib/janus/plugins
Loading plugin 'libjanus_ftl.so'...
Joining Janus requests handler thread
Sessions watchdog started
[2021-02-27 00:05:42.760] [info] Connecting to Orchestration service @ ftl-orchestrator.lan:8085...
[2021-02-27 00:05:42.763] [info] Using Glimesh Service Connection @ http://glimesh-web.lan:4000
[2021-02-27 00:05:42.792] [info] Received new access token, expires in ee8f831cfa0d45a7685ca65c0df6daa9be95691ad185f0a408d40a0c45b6bf32 - 1614462965 = 1614413142 seconds
[2021-02-27 00:05:42.793] [info] FtlServer ready for new connections.
[2021-02-27 00:05:42.793] [info] FTL plugin initialized!
Transport plugins folder: /opt/janus/lib/janus/transports
Loading transport plugin 'libjanus_http.so'...
HTTP webserver started (port 8088, /janus path listener)...
JANUS REST (HTTP/HTTPS) transport plugin initialized!
HTTP transport timer started
[2021-02-27 00:05:45.082] [info] New FTL control connection is pending from 192.168.86.20
[2021-02-27 00:05:45.124] [warning] Couldn't look up HMAC key for channel 1: Could not find a stream key for the given channel.
[2021-02-27 00:05:45.124] [info] Pending FTL control connection has closed.
[2021-02-27 00:06:21.052] [info] New FTL control connection is pending from 192.168.86.20
[2021-02-27 00:06:21.099] [warning] Glimesh service connection received status code 404 when processing GraphQL query.
[2021-02-27 00:06:21.099] [warning] Attempt 1 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...
[2021-02-27 00:06:24.145] [warning] Glimesh service connection received status code 404 when processing GraphQL query.
[2021-02-27 00:06:24.145] [warning] Attempt 2 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...
[2021-02-27 00:06:26.052] [info] 192.168.86.20 didn't authenticate within 5000ms, closing
[2021-02-27 00:06:27.189] [warning] Glimesh service connection received status code 404 when processing GraphQL query.
[2021-02-27 00:06:27.190] [warning] Attempt 3 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...
[2021-02-27 00:06:30.237] [warning] Glimesh service connection received status code 404 when processing GraphQL query.
[2021-02-27 00:06:30.237] [warning] Attempt 4 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...
[2021-02-27 00:06:33.282] [warning] Glimesh service connection received status code 404 when processing GraphQL query.
[2021-02-27 00:06:33.282] [warning] Attempt 5 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...
[2021-02-27 00:06:36.329] [warning] Glimesh service connection received status code 404 when processing GraphQL query.
[2021-02-27 00:06:36.329] [warning] Attempt 6 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...
[2021-02-27 00:06:39.375] [warning] Glimesh service connection received status code 404 when processing GraphQL query.
[2021-02-27 00:06:39.375] [warning] Attempt 7 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...
[2021-02-27 00:06:42.425] [warning] Glimesh service connection received status code 404 when processing GraphQL query.
[2021-02-27 00:06:42.425] [warning] Attempt 8 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...
[2021-02-27 00:06:45.471] [warning] Glimesh service connection received status code 404 when processing GraphQL query.
[2021-02-27 00:06:45.471] [warning] Attempt 9 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...
[2021-02-27 00:06:48.519] [warning] Glimesh service connection received status code 404 when processing GraphQL query.
[2021-02-27 00:06:48.519] [warning] Attempt 10 / 10: Glimesh GraphQL query failed. Retrying in 3000 ms...
[2021-02-27 00:06:51.566] [warning] Glimesh service connection received status code 404 when processing GraphQL query.
[2021-02-27 00:06:51.566] [error] Aborting Glimesh GraphQL query after 10 failed attempts.
terminate called after throwing an instance of 'ServiceConnectionCommunicationFailedException'
  what():  Glimesh GraphQL query failed.
./run.sh: line 3: 18423 Aborted                 (core dumped) /opt/janus/bin/janus

glimesh elixr service

[info] POST /api
[debug] Processing with Absinthe.Plug.GraphiQL
  Parameters: %{"query" => "query { channel(id: \"10\") { hmacKey } }", "variables" => nil}
  Pipelines: [:graphql]
[debug] QUERY OK source="oauth_access_tokens" db=0.1ms idle=24.8ms
SELECT o0."id", o0."token", o0."refresh_token", o0."expires_in", o0."revoked_at", o0."scopes", o0."previous_refresh_token", o0."resource_owner_id", o0."application_id", o0."inserted_at", o0."updated_at" FROM "oauth_access_tokens" AS o0 WHERE (o0."token" = $1) ["ee8f831cfa0d45a7685ca65c0df6daa9be95691ad185f0a408d40a0c45b6bf32"]
[debug] QUERY OK source="oauth_applications" db=0.1ms idle=25.2ms
SELECT o0."id", o0."name", o0."uid", o0."secret", o0."redirect_uri", o0."scopes", o0."owner_id", o0."inserted_at", o0."updated_at", o0."id" FROM "oauth_applications" AS o0 WHERE (o0."id" = $1) [1]
[debug] QUERY OK source="users" db=0.1ms idle=25.5ms
SELECT u0."id", u0."username", u0."displayname", u0."email", u0."hashed_password", u0."confirmed_at", u0."can_stream", u0."team_role", u0."is_admin", u0."is_gct", u0."gct_level", u0."is_banned", u0."ban_reason", u0."avatar", u0."social_twitter", u0."social_youtube", u0."social_instagram", u0."social_discord", u0."social_guilded", u0."can_payments", u0."is_stripe_setup", u0."is_tax_verified", u0."tax_withholding_percent", u0."stripe_user_id", u0."stripe_customer_id", u0."stripe_payment_method", u0."youtube_intro_url", u0."profile_content_md", u0."profile_content_html", u0."tfa_token", u0."allow_glimesh_newsletter_emails", u0."allow_live_subscription_emails", u0."inserted_at", u0."updated_at", u0."id" FROM "users" AS u0 WHERE (u0."id" = $1) [1]
[debug] ABSINTHE schema=Glimesh.Schema variables=%{}
---
query { channel(id: "10") { hmacKey } }
---
[info] [Absinthe Operation] Access Type: user Access Identifier: danstiner Query: "query { channel(id: \"10\") { hmacKey } }"
[debug] QUERY OK source="channels" db=0.1ms idle=28.2ms
SELECT c0."id", c0."user_id", c0."category_id", c0."user_id", c0."stream_id", c0."title", c0."status", c0."language", c0."mature_content", c0."thumbnail", c0."hmac_key", c0."inaccessible", c0."backend", c0."disable_hyperlinks", c0."block_links", c0."chat_rules_md", c0."chat_rules_html", c0."poster", c0."chat_bg", c0."inserted_at", c0."updated_at" FROM "channels" AS c0 WHERE (c0."id" = $1) [10]
[info] Sent 404 in 18ms
[debug] ** (Ecto.NoResultsError) expected at least one result but got none in query:

from c0 in Glimesh.Streams.Channel,
  where: c0.id == ^"10"

    (ecto 3.4.6) lib/ecto/repo/queryable.ex:122: Ecto.Repo.Queryable.one!/3
    (glimesh 0.1.0) lib/glimesh/channel_lookups.ex:99: Glimesh.ChannelLookups.get_channel!/1
    (glimesh 0.1.0) lib/glimesh/graphql/resolvers/channel_resolver.ex:30: Glimesh.Resolvers.ChannelResolver.find_channel/2
    (absinthe 1.5.2) lib/absinthe/resolution.ex:206: Absinthe.Resolution.call/2
    (absinthe 1.5.2) lib/absinthe/phase/document/execution/resolution.ex:230: Absinthe.Phase.Document.Execution.Resolution.reduce_resolution/1
    (absinthe 1.5.2) lib/absinthe/phase/document/execution/resolution.ex:185: Absinthe.Phase.Document.Execution.Resolution.do_resolve_field/3
    (absinthe 1.5.2) lib/absinthe/phase/document/execution/resolution.ex:170: Absinthe.Phase.Document.Execution.Resolution.do_resolve_fields/6
    (absinthe 1.5.2) lib/absinthe/phase/document/execution/resolution.ex:88: Absinthe.Phase.Document.Execution.Resolution.walk_result/5
    (absinthe 1.5.2) lib/absinthe/phase/document/execution/resolution.ex:67: Absinthe.Phase.Document.Execution.Resolution.perform_resolution/3
    (absinthe 1.5.2) lib/absinthe/phase/document/execution/resolution.ex:24: Absinthe.Phase.Document.Execution.Resolution.resolve_current/3
    (absinthe 1.5.2) lib/absinthe/pipeline.ex:368: Absinthe.Pipeline.run_phase/3
    (absinthe_plug 1.5.0) lib/absinthe/plug.ex:445: Absinthe.Plug.run_query/4
    (absinthe_plug 1.5.0) lib/absinthe/plug.ex:258: Absinthe.Plug.call/2
    (phoenix 1.5.6) lib/phoenix/router/route.ex:41: Phoenix.Router.Route.call/2
    (phoenix 1.5.6) lib/phoenix/router.ex:352: Phoenix.Router.__call__/2
    (glimesh 0.1.0) lib/glimesh_web/endpoint.ex:1: GlimeshWeb.Endpoint.plug_builder_call/2
    (glimesh 0.1.0) lib/glimesh_web/endpoint.ex:4: anonymous fn/3 in GlimeshWeb.Endpoint."call (overridable 3)"/2
    (appsignal 2.0.8) lib/appsignal/instrumentation.ex:9: Appsignal.Instrumentation.instrument/1
    (glimesh 0.1.0) lib/plug/debugger.ex:132: GlimeshWeb.Endpoint."call (overridable 4)"/2
    (glimesh 0.1.0) lib/glimesh_web/endpoint.ex:1: GlimeshWeb.Endpoint."call (overridable 5)"/2
danstiner commented 3 years ago

The "right" fix here may actually be in https://github.com/Glimesh/glimesh.tv, it is pretty standard for a GraphQL service to always return HTTP 200 and put the not found message in the result under the errors key.

But it might be worth working around this quick by just handling 404's.

clone1018 commented 3 years ago

I added error handling to all of the API endpoints, so we should no longer be sending any non-JSON message unless something crashes.

haydenmc commented 3 years ago

Good bug Dan, thanks!

clone1018 commented 3 years ago

Oops, I didn't mean for the GitHub Automation Gods to close this PR when I merged the other one! Reopen if necessary.