andersfylling / disgord

Go module for interacting with the documented Discord's bot interface; Gateway, REST requests and voice
BSD 3-Clause "New" or "Revised" License
502 stars 71 forks source link

Cannot Reconnect: connection already exist #390

Open suhaibmalik opened 3 years ago

suhaibmalik commented 3 years ago

Getting a random reconnect fail. I can't reliably reproduce this aside from just waiting for it to eventually happen.

Logs:

DEBUG: [[ws-e,s:14506,shard:0] heartbeat ACK ok]
DEBUG: [[ws-e,s:14507,shard:0] sent heartbeat]
DEBUG: [[ws-e,s:14508,shard:0] heartbeat ACK ok]
DEBUG: [[ws-e,s:14509,shard:0] sent heartbeat]
INFO: [[ws-e,s:14510,shard:0] heartbeat ACK was not received, forcing reconnect]
DEBUG: [[ws-e,s:14511,shard:0] stopping pulse]
DEBUG: [[ws-e,s:14512,shard:0] is reconnecting]
DEBUG: [[ws-e,s:14513,shard:0] closing emitter]
DEBUG: [[ws-e,s:14514,shard:0] closing receiver]
INFO: [[ws-e,s:14515,shard:0] disconnected]
DEBUG: [[ws-e,s:14516,shard:0] trying to connect]
DEBUG: [[shardSync] shard 0 is waiting to identify]
DEBUG: [[ws-e,s:14517,shard:0] waiting to send identify/resume]
DEBUG: [[ws-e,s:14518,shard:0] starting receiver]
DEBUG: [[ws-e,s:14520,shard:0] starting emitter]
DEBUG: [[ws-e,s:14519,shard:0] Ready to receive operation codes...]
DEBUG: [[ws-e,s:14521,shard:0] closing receiver after read error]
ERROR: [[ws-e,s:14522,shard:0] discord timeout during connect (3 minutes). No idea what went wrong..]
DEBUG: [[shardSync] shard 0 waited and finished execution after 3m0.041366691s]
ERROR: [[ws-e,s:14523,shard:0] establishing connection failed:  websocket connected but was not able to send identify packet within 3 minutes]
INFO: [[ws-e,s:14524,shard:0] next connection attempt in  3s]
DEBUG: [[ws-e,s:14525,shard:0] trying to connect]
ERROR: [[ws-e,s:14526,shard:0] establishing connection failed:  cannot Connect while a connection already exist]
INFO: [[ws-e,s:14527,shard:0] next connection attempt in  7s]
DEBUG: [[ws-e,s:14528,shard:0] trying to connect]
ERROR: [[ws-e,s:14529,shard:0] establishing connection failed:  cannot Connect while a connection already exist]
INFO: [[ws-e,s:14530,shard:0] next connection attempt in  11s]
DEBUG: [[ws-e,s:14531,shard:0] trying to connect]
ERROR: [[ws-e,s:14532,shard:0] establishing connection failed:  cannot Connect while a connection already exist]
INFO: [[ws-e,s:14533,shard:0] next connection attempt in  15s]
DEBUG: [[ws-e,s:14534,shard:0] trying to connect]
ERROR: [[ws-e,s:14535,shard:0] establishing connection failed:  cannot Connect while a connection already exist]
INFO: [[ws-e,s:14536,shard:0] next connection attempt in  19s]
DEBUG: [[ws-e,s:14537,shard:0] trying to connect]
ERROR: [[ws-e,s:14538,shard:0] establishing connection failed:  cannot Connect while a connection already exist]
INFO: [[ws-e,s:14539,shard:0] next connection attempt in  23s]
DEBUG: [[ws-e,s:14540,shard:0] trying to connect]
ERROR: [[ws-e,s:14541,shard:0] establishing connection failed:  cannot Connect while a connection already exist]
INFO: [[ws-e,s:14542,shard:0] next connection attempt in  27s]
DEBUG: [[ws-e,s:14543,shard:0] trying to connect]
...

The fix is to recreate the pod (container). The process immediately reconnects with the new process.

Connection Code:

...

client := disgord.New(disgord.Config{
  ProjectName: "Corvis",
  BotToken:    token,
  Logger:      logger,
})
defer client.Gateway().StayConnectedUntilInterrupted()

client.Gateway().BotReady(func() {

...

Also, if the issue is sporadic and difficult to resolve within code, it'd be better for the process to exit to let the platform solve the issue (e.g. After x retries, exit process).

andersfylling commented 3 years ago

You aren't alone with this issue. A discord user reported the same.

For progress on improving the gateway; I've simply written a new gateway system. It's not done yet, but one of the features is to give you complete control over your shards and handle exit codes (if you want to). Otherwise it just runs in the background as normal. However, it's much easier to write tests for it.

https://github.com/andersfylling/discordgateway

Sadly I'm uncertain when I will be merging this into disgord. Missing write capability and proper testing of heartbeats.

suhaibmalik commented 3 years ago

@andersfylling All good! Good to know that I'm not missing some low-hanging fix.

I can implement a force-exit for myself in the meantime.

suhaibmalik commented 3 years ago

I've been saving debug logs for over a month in the hopes of finding a specific error to target with a force-exit. However, I've not encountered the same disconnect since. My assumption is that the root cause was on Discord's end. Still would like to see better error handling but if that's already being tracked via another issue/branch (e.g. gateway rewrite), I recommend closing this issue.

andersfylling commented 3 years ago

There is still the matter of how shards should be implemented in disgord after that discordgateway project is "done". I think I want to allow people to inject a ShardManager so they can easily get complete control. But I'm uncertain how I want to deal with errors, blocking? non blocking? etc.

suhaibmalik commented 3 years ago

My opinion: In a world where production services are orchestrated (Kubernetes, docker-compose, etc.) it's safer to default to crashing an app instead of continuing to run a, deceptive, bad state. If I were in your shoes, I would default to having any unhandled error crash the app (i.e default panic handler log.Fatals the app). As long as users have the option to override that panic, you're not closing the door to any particular usage.

FiHoEco commented 8 months ago

This has been happening since 2019 and is still not fixed in 2023