Kraigie / nostrum

Elixir Discord Library
https://kraigie.github.io/nostrum/
MIT License
597 stars 128 forks source link

Blows up in confusing way if Discord is down #417

Open wisq opened 2 years ago

wisq commented 2 years ago

So Discord went down for a bit, and I started getting this error while trying to start my bot:

17:25:40.097 [error] GenServer Ratelimiter terminating
** (Jason.DecodeError) unexpected byte at position 0: 0x3C ("<")
    (jason 1.3.0) lib/jason.ex:92: Jason.decode!/2
    (nostrum 0.5.1) lib/nostrum/api/ratelimiter.ex:196: Nostrum.Api.Ratelimiter.format_response/1
    (nostrum 0.5.1) lib/nostrum/api/ratelimiter.ex:62: Nostrum.Api.Ratelimiter.handle_call/3
    (stdlib 3.17) gen_server.erl:721: :gen_server.try_handle_call/4
    (stdlib 3.17) gen_server.erl:750: :gen_server.handle_msg/6
    (stdlib 3.17) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.330.0>): {:queue, %{body: "", headers: [{"content-type", "application/json"}], method: :get, params: [], route: "/gateway/bot"}, nil}
State: #PID<0.332.0>
Client #PID<0.330.0> is alive

    (stdlib 3.17) gen.erl:214: :gen.do_call/4
    (elixir 1.13.3) lib/gen_server.ex:1027: GenServer.call/3
    (nostrum 0.5.1) lib/nostrum/util.ex:98: Nostrum.Util.get_new_gateway_url/0
    (nostrum 0.5.1) lib/nostrum/shard/supervisor.ex:16: Nostrum.Shard.Supervisor.start_link/1
    (stdlib 3.17) supervisor.erl:414: :supervisor.do_start_child_i/3
    (stdlib 3.17) supervisor.erl:400: :supervisor.do_start_child/2
    (stdlib 3.17) supervisor.erl:384: anonymous fn/3 in :supervisor.start_children/2
    (stdlib 3.17) supervisor.erl:1250: :supervisor.children_map/4

(Note that the error was actually on line 195 in stock, the Jason.decode! line — it's just that I had added a File.write("/tmp/response", body) line to see what was happening.)

It's certainly entirely reasonable for Nostrum to refuse to start up while Discord is down, but the error message could be a lot better — this makes it look like there's some protocol-level error, when the error is just that I'm getting an HTML page (a stock 502 Bad Gateway page) instead of a JSON page.

I'd submit a PR, but I'm not sure how best to fix this. There's at least a couple options here:

Kraigie commented 2 years ago

Thanks for the report! I'm partial to the second option - moving away from Jason.decode!/2 and using the non-banged version. I'm not sure what the flow for this would look like though. If we encounter this error, what do we do about it?

wisq commented 2 years ago

Yeah, I'm not sure. It's probably entirely reasonable to still just raise an exception and die, just with a more helpful message (like Got a "502 Bad Gateway" response with a non-JSON payload) to make it clear what's happening.

Having the Ratelimiter die will also mean that anyone who is currently issuing an API call will also raise an exception (and probably die, and hopefully be resurrected by their supervisor tree). That also seems good to me — I imagine you don't really want an API outage to be reported as an {:error, %Nostrum.Error.ApiError{...}} tuple, because that's for actual API errors, with defined behaviour. We're very much in the realm of undefined behaviour here, which is what exceptions are for, I think.