cronokirby / alchemy

A discord library for Elixir
MIT License
151 stars 34 forks source link

Some error messages are difficult to understand what happened #49

Closed purei closed 6 years ago

purei commented 6 years ago

I wanted to mention some error messages I received that might be made more legible/handled differently.

These errors come from having a bot active on a laptop and sleeping/waking the laptop, I realized. I wonder if the 3rd one could be caught and handled. I'm not sure how to handle the first 2 byte strings errors.

The logger drops [info]s like this. A library upstream from yours?

15:27:34.907 [info]  [83, 83, 76, 58, 32, 83, 111, 99, 107, 101, 116, 32, 101, 114, 114, 111, 114, 58, 32, 'etimedout', 32, '\n']
16:21:09.943 [info]  [83, 83, 76, 58, 32, 83, 111, 99, 107, 101, 116, 32, 101, 114, 114, 111, 114, 58, 32, 'enetdown', 32, '\n']

Also, I've received this:

04:43:06.244 [error] ** State machine #PID<0.513.0> terminating
** Last message in was {:heartbeat, 41250}
** When State == :connected
**      Data  == {:context,
 {:websocket_req, :wss, 'gateway.discord.gg', 443, '/?v=6&encoding=json',
  :infinity, :undefined, 1,
  {:sslsocket, {:gen_tcp, #Port<0.16144>, :tls_connection, :undefined},
   #PID<0.514.0>},
  {:transport, :ssl, :ssl, :ssl_closed, :ssl_error,
   [mode: :binary, active: true, verify: :verify_none, packet: 0]},
  "<snip>", :undefined, 1, :undefined, :undefined,
  :undefined},
 {:transport, :ssl, :ssl, :ssl_closed, :ssl_error,
  [mode: :binary, active: true, verify: :verify_none, packet: 0]}, [],
 {:wss, 'gateway.discord.gg', 443, '/?v=6&encoding=json'},
 {Alchemy.Discord.Gateway,
  %Alchemy.Discord.Gateway.State{seq: 2,
   session_id: "<snip>", shard: [0, 1],
   token: "<snip>",
   trace: ["gateway-prd-main-pz03", "discord-sessions-prd-1-14"],
   user_id: "<snip>"}}, "", false, :undefined, 0}
** Reason for termination =
** {:error, :closed}

Been using this project a lot lately, thank you so much. I started using your project the day of your last commit. Seeing a currently active project was wonderful; I'm really enjoying working in Elixir.

cronokirby commented 6 years ago

I don't think I'm directly causing those first errors. Those might be coming from either httpoison, the websocket, or stuff related to voice. Needs further investigation. If you know how to reproduce those first errors, that'd be useful.

So if the 2nd one happens whenever you sleep your laptop, that's probably because that cuts the internet connection, thus closing the websocket, in a not clean way. This isn't a problem unto itself, as the socket is supervised and normally connects back up fine when the laptop wakes up. In general the error messages related to http requests as well as websocket termination aren't stellar. so those should be worked on.

So yeah, I'll try and give some kind of nice logging warning / info / debug when the websocket closes.

purei commented 6 years ago

yeah, the laptop closing/opening causes no lasting problem -- everything just works once the laptop opens, it's great.

as far as reproducible, there seems to be two common outcomes:

the one with the state machine stack dump:

08:58:57.342 [info]  [83, 83, 76, 58, 32, 83, 111, 99, 107, 101, 116, 32, 101, 114, 114, 111, 114, 58, 32, 'etimedout', 32, '\n']
08:58:57.343 [info]  Shard [0, 1] connecting to the gateway
08:58:57.344 [error] ** State machine #PID<0.621.0> terminating (long red stack dump, as posted above)
08:58:57.612 [debug] Shard [0, 1] received READY

and one without:

10:16:46.538 [info]  [83, 83, 76, 58, 32, 83, 111, 99, 107, 101, 116, 32, 101, 114, 114, 111, 114, 58, 32, 'etimedout', 32, '\n']
10:16:47.003 [debug] Shard [0, 1] connected with an invalid session id
10:16:47.003 [info]  Shard [0, 1] connecting to the gateway
10:16:47.660 [debug] Shard [0, 1] received READY

the 'enetdown' error is more rare, and seems to replace the 'etimedout', you get one or the other. i'm still trying to figure out which library might be producing those.

cronokirby commented 6 years ago

So I believe that the "State machine terminating" error might be out of my control, given that I actually catch termination and log it... https://github.com/cronokirby/alchemy/blob/master/lib/Discord/Gateway/gateway.ex#L75

This would then be an error related to the websocket lib not actually catching the termination, or that this kind of termination can't be caught in the first place; given that sleeping is a somewhat abrupt stop to the usual otp lifecycle, this wouldn't be too surprising...

Still stumped on the enetdown stuff...

purei commented 6 years ago

It's possible it's bad home internet? Just started with this local ISP; had the chance to try not Comcast. I'm contacting them.

I'm getting it while recently interacting with the server. The times that it cycles something like

16:13:09.773 [info]  [83, 83, 76, 58, 32, 83, 111, 99, 107, 101, 116, 32, 101, 114, 114, 111, 114, 58, 32, 'ehostunreach', 32, '\n']

16:13:10.123 [info]  Shard [0, 1] connecting to the gateway

are still uncorrelated w/ anything, which seems unlikely. i think the next thing is to try hosting this on gigalixir, and seeing if the server is the problem.

thanks for bearing with.

cronokirby commented 6 years ago

Might have something to do with ssl?

purei commented 6 years ago

Since I moved my serving from a laptop to something more server-y these errors have all but gone -- Think I've seen it once when the home internet had issues and probably acted like a laptop-closing-disconnect.

Gonna close this for now; thanks for helping me brainstorm what could be wrong with it