Closed JustArchi closed 2 years ago
Not the first type of not-reconnecting issue... I just kill the whole application and restart it on service level when it disconnects to prevent the library ever failing to reconnect... :woman_shrugging:
It'd be really nice if those core issues could be solved for good. Doing restarts on application level completely defeats the purpose of using a library in the first place.
Handling all the edge cases in the reconnect logic is hard - especially when discord outages are concerned, so I doubt we'll ever get all of them. This is definitely something we'll look into though.
Regarding above, neither myself nor the other core contributors have ever been able to reproduce these connection deadlocks (with the exception of the discord outage deadlock), which makes it very difficult to diagnose.
I've looked over the ConnectionManager code myself numerous times and everything checks out to me - that being said, multiple eyes never hurts.
Unless you can create a set of circumstances that makes this issue reproducible in our environments, or identify where in the library the deadlock is sourcing from, there isn't really much more we can do about this issue until one of us finally encounters it.
I understand that and appreciate your time spent on this, since I guess everybody here knows how hard it is to debug an issue that you can't possibly reproduce in your own environment. Sadly there is nothing much I can help with personally, as I've never looked into discord.net code until now, so I wouldn't even know where to start looking.
Unless you can create a set of circumstances that makes this issue reproducible in our environments
Well, sometimes stupid ideas are the best ones. Have you tried to unplug the network cable and plug it back in a few times in various periods to check if library always reconnects properly? 😀
I tried killing my local connection three times, and all three times the client attempted to reconnect, recognized a WinHttpException, and started exponential backoff.
For reference, what OS and .NET Core version are you running?
Running latest nightly on Debian Testing right now, but that issue also happened on 2.0 stable and all sort of other versions I've been testing to ensure it's not my own fault.
.NET Command Line Tools (2.1.300-preview2-008293)
Product Information:
Version: 2.1.300-preview2-008293
Commit SHA-1 hash: 68922e2a51
Runtime Environment:
OS Name: debian
OS Version:
OS Platform: Linux
RID: debian-x64
Base Path: /opt/dotnet/sdk/2.1.300-preview2-008293/
Host (useful for support):
Version: 2.1.0-preview2-26227-01
Commit: 86d1f92013
.NET Core SDKs installed:
2.1.300-preview2-008293 [/opt/dotnet/sdk]
The.NET Core runtimes installed:
Microsoft.AspNetCore.App 2.1.0-preview2-30171 [/opt/dotnet/shared]
Microsoft.NETCore.App 2.1.0-preview2-26227-01 [/opt/dotnet/shared]
Microsoft.AspNetCore.All 2.1.0-preview2-30171 [/opt/dotnet/shared]
I get this problem every few days as well. I've been on the 2.0 beta, most recently on 2.0.0-beta2-00890. This is the usual stack trace:
System.Exception: WebSocket connection was closed ---> System.Net.WebSockets.WebSocketException: The remote party closed the WebSocket connection without completing the close handshake. ---> System.Net.WebSockets.WebSocketException: The remote party closed the WebSocket connection without completing the close handshake.
at System.Net.WebSockets.ManagedWebSocket.<EnsureBufferContainsAsync>d__70.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Net.WebSockets.ManagedWebSocket.<ReceiveAsyncPrivate>d__61.MoveNext()
--- End of inner exception stack trace ---
at System.Net.WebSockets.ManagedWebSocket.<ReceiveAsyncPrivate>d__61.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Discord.Net.WebSockets.DefaultWebSocketClient.<RunAsync>d__33.MoveNext()
--- End of inner exception stack trace ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Discord.ConnectionManager.<>c__DisplayClass28_0.<<StartAsync>b__0>d.MoveNext()
I can reproduce.
System.Exception: WebSocket connection was closed ---> System.Net.WebSockets.WebSocketException: The remote party closed the WebSocket connection without completing the close handshake.
at System.Net.WebSockets.WebSocketBase.WebSocketOperation.<Process>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Net.WebSockets.WebSocketBase.<ReceiveAsyncCore>d__45.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Discord.Net.WebSockets.DefaultWebSocketClient.<RunAsync>d__33.MoveNext()
--- End of inner exception stack trace ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Discord.ConnectionManager.<>c__DisplayClass28_0.<<StartAsync>b__0>d.MoveNext()
This happens once in a while, but it reconnects after 1-2 seconds.
I've started getting this stack trace, and the bot does not reconnect when it happens.
Gateway: System.Exception: Server missed last heartbeat
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Discord.ConnectionManager.<>c__DisplayClass28_0.<<StartAsync>b__0>d.MoveNext()
Server missed last heartbeat
The last time I didn't get this one a few times a day was 0.9x ;)
I can reproduce too:
01:56:09,735 [49] INFO - 01:56:09 Gateway Failed to resume previous session
01:56:09,735 [49] INFO - 01:56:09 Gateway Disconnecting
01:56:09,750 [87] INFO - 01:56:09 Gateway Disconnected
nothing happend after Disconnected. I will try to get more information about the disconnect.
I'm also experiencing this problem. Disconnect reasons can vary. Sometimes I get
System.Exception: Server missed last heartbeat
And the bot just disconnects with no attempt to reconnect.
Sometimes a WebSocketException occurs, which causes the bot to disconnect, then reconnect. However, sometimes if the bot can't resume the previous session, it'll just disconnect. The log for that is similar to the above comment
Disconnects are very inconsistent. Sometimes, the bot will successfully recover, sometimes it won't. My bot only runs on one server at the moment. I've noticed that it will disconnect sometimes if someone is using many commands, with a small time gap in between. I believe this has always been due to the heartbeat issue mentioned in the first paragraph
I'm running the bot on an AWS EC2 T2.Micro. The OS is a server version of Ubuntu. Here's some basic specs
Kernel: x86_64 Linux 4.4.0-1054-aws
Uptime: 3d 6h 6m
Packages: 523
Shell: bash 4.3.48
CPU: Intel Xeon CPU E5-2676 v3 @ 2.4GHz
RAM: 305MiB / 990MiB
I'd be happy to provide anymore information if needed
Technical specification does not matter (neither does kernel,) if anything, it would be RID[1] which is a string that identifies the platform and .NET Core may behave differently... And .NET Core version. (Runtime version, versions of the different components are... different.)
[1] RID: https://docs.microsoft.com/en-us/dotnet/core/rid-catalog
I ran the dotnet --info
command, and it returned this:
.NET Command Line Tools (2.0.0)
Product Information:
Version: 2.0.0
Commit SHA-1 hash: cdcd1928c9
Runtime Environment:
OS Name: ubuntu
OS Version: 16.04
OS Platform: Linux
RID: ubuntu.16.04-x64
Base Path: /usr/share/dotnet/sdk/2.0.0/
Microsoft .NET Core Shared Framework Host
Version : 2.0.0
Build : e8b8861ac7faf042c87a5c2f9f2d04c98b69f28d
I am also using Discord.NET version 1.0.2
A couple of things to note here. Firstly, I need to update .NET Core. Not entirely sure why it wasn't updated in the first place, but I'll get on that. Secondly, there is a beta version of Discord.NET, which I will try out. I don't suspect doing both of these will help since other people have been reporting this issue with both updated. Maybe I'll get a nice surprise though.
I need to update .NET Core. Not entirely sure why it wasn't updated in the first place, but I'll get on that.
If you install it from microsoft's website it won't be updated.
I got the issue again. This time, i logged the Disconnect event:
07:37:38,029 [15] INFO - 07:37:38 Gateway Failed to resume previous session
07:37:38,029 [15] INFO - 07:37:38 Gateway Disconnecting
07:37:38,031 [76] INFO - 07:37:38 Gateway Disconnected
07:37:38,031 [76] ERROR - DiscordClient lost Connection: A task was canceled.
Exeption:
Maybe i should do the same as RheaAyase and completely restart it when it disconnects.
Maybe i should do the same as RheaAyase an completely restart it when it disconnects.
It's still a hack, a workaround (an ugly one at that.) The bug remains.
It's still a hack, a workaround (an ugly one at that.) The bug remains.
I know it's not a "clean" solution, but better than if the bot doesn't run properly for half a day.
Well it's quite a bit of an issue for bigger bots - restart takes a while.
Well look what happened:
System.Exception: Server missed last heartbeat
One interesting thing that happened this time is that it logged the disconnecting message, but never logged the disconnected message. This happened after I used several commands in a short amount of time.
I've just started using Discord.Net in a project and have immediately noticed this issue. I have a bot that starts up and runs LoginAsync()
and then StartAsync()
. It connects and works fine. However, after some number of hours, I notice the bot has gone offline on Discord, and my console log displays Gateway: Disconnected
. That's it. There are no other messages about a heartbeat, and it literally never, ever, ever attempts to reconnect. It always disconnects after a while, and it never reconnects.
When this happens, the bot stays offline on Discord, and no longer responds to PMs. However, curiously, it is still able to send automated messages to a channel via SendMessageAsync()
despite being offline.
My only solution to get the bot fully online again is to restart it completely. This is quite annoying to have to do every single day. Sometimes I have to do it multiple times per day.
@HunterGraubard run it as a service, and in the disconnected event simply Exit()
- the service will restart it again... an awful workaround that will do the trick...
There's a known deadlock somewhere in our code, but we're not sure where it is. I'm not sure if we'll be able to fix this bug before 3.0, unfortunately.
Make sure its a well documented bug... Since, it is a extremely significant instability issue.
if its a deadlock and you get a process dump won't that help to debug the issue?
Is this still an issue? As of the latest release, I'm able to disconnect the network and reconnect fine
21:17:22 Gateway Connecting
21:17:23 Gateway Connected
21:17:23 Gateway Ready
21:18:23 Gateway System.Exception: WebSocket connection was closed // ... stacktrace omitted for brevity
21:18:23 Gateway Disconnecting
21:18:23 Gateway Disconnected
21:18:24 Gateway Connecting
21:18:34 Gateway Resumed previous session
21:18:34 Gateway Connected
I will.... remove the code which forces a full bot restart, and give it a try.
I still have issues with deadlocking.
Whatever the websocket closure was. It could be server missed heartbeat, closed normally, or anything else. It will just stay permanently disconnected.
What's strange is that my bot will say it's disconnecting from the gateway, disconnect all the servers. But it stays online as if it was still connected and heartbeating. Ofc it can't respond to commands though as it's "disconnected". I don't believe this is an issue with my code as I don't get those "Blocking gateway" messages.
I still say there is a problem somewhere in the librarie's code. Sometimes it disconnects and reconnects fine sometimes it doesn't. More often then not I'm restarting it manually almost consistently every other day.
Hi. I've updated to 2.0.1 and got this again ( 02.27-22:46:10 DiscordBot Gateway Latency = 266 ms 02.27-22:46:48 DiscordBot Gateway Received Dispatch (PRESENCE_UPDATE) 02.27-22:46:51 DiscordBot Gateway Received Dispatch (PRESENCE_UPDATE) 02.27-22:46:51 DiscordBot Gateway Sent Heartbeat 02.27-22:46:52 DiscordBot Gateway Received HeartbeatAck 02.27-22:46:52 DiscordBot Gateway Latency = 281 ms 02.27-22:47:05 DiscordBot Gateway Heartbeat Stopped 02.27-22:47:05 DiscordBot Gateway
at Discord.ConnectionManager.<>c__DisplayClass29_0.<
02.27-22:47:05 DiscordBot Gateway Disconnecting
02.27-22:47:05 DiscordBot Gateway Disconnecting ApiClient
02.27-22:47:05 DiscordBot Gateway Waiting for heartbeater
02.27-22:47:05 DiscordBot Gateway Waiting for guild downloader
02.27-22:47:05 DiscordBot Gateway Clearing large guild queue
02.27-22:47:05 DiscordBot Gateway Raising virtual GuildUnavailables
02.27-22:47:05 DiscordBot Gateway Disconnected from Bitsum Test
02.27-22:47:05 DiscordBot Gateway Disconnected from Bitsum
02.27-22:47:05 DiscordBot WebSocket connection was closed
02.27-22:47:05 DiscordBot at Discord.ConnectionManager.<>c__DisplayClass29_0.<
We also have this problem since a few days now. The bot was running fine for a few months without any issues. Then the log file was too big (a few gigabytes). So we deleted the log file and started with a new one.
I don't know if this is the problem, but in other issues I read, that logging makes some problems.
This is what is written in our log file before the disconnect happened.
21/04/2019 05:10:43 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:44 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:45 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:46 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:46 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:46 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:48 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:48 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:48 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:49 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:49 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:49 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:50 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:51 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:51 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:51 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:52 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:53 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:53 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:54 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:54 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:55 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:55 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:55 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:56 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:56 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:10:58 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:11:00 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:11:01 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 05:11:01 : Heartbeat Stopped
21/04/2019 05:11:01 :
21/04/2019 05:11:01 : Disconnecting
21/04/2019 05:11:01 : Disconnecting ApiClient
21/04/2019 05:11:01 : Waiting for heartbeater
21/04/2019 05:11:01 : Waiting for guild downloader
21/04/2019 05:11:01 : Clearing large guild queue
21/04/2019 05:11:01 : Raising virtual GuildUnavailables
21/04/2019 05:11:01 : Disconnected from Server
21/04/2019 05:11:01 : Disconnected
21/04/2019 05:11:02 : Connecting
21/04/2019 05:11:02 : Connecting ApiClient
21/04/2019 05:11:02 : GET gateway: 59,15 ms
The reconnect never happened like it was the case at 2.49am. However, there were a lot of those PRESENCE_UPDATE
log entries. In an update of our bot I'm ignoring them now.
This is the part of the log when the disconnect happened at 2.49am:
21/04/2019 02:49:01 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:01 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:01 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:02 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:02 : Heartbeat Stopped
21/04/2019 02:49:02 :
21/04/2019 02:49:02 : Disconnecting
21/04/2019 02:49:02 : Disconnecting ApiClient
21/04/2019 02:49:02 : Waiting for heartbeater
21/04/2019 02:49:02 : Waiting for guild downloader
21/04/2019 02:49:02 : Clearing large guild queue
21/04/2019 02:49:02 : Raising virtual GuildUnavailables
21/04/2019 02:49:02 : Disconnected from Server
21/04/2019 02:49:02 : Disconnected
21/04/2019 02:49:03 : Connecting
21/04/2019 02:49:03 : Connecting ApiClient
21/04/2019 02:49:03 :
21/04/2019 02:49:03 : Disconnecting
21/04/2019 02:49:03 : Disconnecting ApiClient
21/04/2019 02:49:03 : Waiting for heartbeater
21/04/2019 02:49:03 : Waiting for guild downloader
21/04/2019 02:49:03 : Clearing large guild queue
21/04/2019 02:49:03 : Raising virtual GuildUnavailables
21/04/2019 02:49:03 : Disconnected
21/04/2019 02:49:05 : Connecting
21/04/2019 02:49:05 : Connecting ApiClient
21/04/2019 02:49:05 : GET gateway: 43,7 ms
21/04/2019 02:49:05 : Received Hello
21/04/2019 02:49:05 : Heartbeat Started
21/04/2019 02:49:05 : Resuming
21/04/2019 02:49:05 : Sent Heartbeat
21/04/2019 02:49:05 : Sent Resume
21/04/2019 02:49:06 : Received HeartbeatAck
21/04/2019 02:49:06 : Latency = 109 ms
21/04/2019 02:49:06 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:06 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:06 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:06 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:06 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:06 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:06 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:06 : Received Dispatch (RESUMED)
21/04/2019 02:49:06 : Connected to Server
21/04/2019 02:49:06 : Resumed previous session
21/04/2019 02:49:06 : Sending Status
21/04/2019 02:49:06 : Sent StatusUpdate
21/04/2019 02:49:06 : Connected
21/04/2019 02:49:06 : Raising Event
21/04/2019 02:49:06 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:06 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:07 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:07 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:07 : Received Dispatch (PRESENCE_UPDATE)
21/04/2019 02:49:08 : Received Dispatch (PRESENCE_UPDATE)
I hope this helps somehow.
2019-04-09 15:57:35.381 +00:00 [INF] 15:57:35 Gateway Disconnecting
2019-04-09 15:57:35.403 +00:00 [INF] 15:57:35 Gateway Disconnected
2019-04-09 15:57:36.403 +00:00 [INF] 15:57:36 Gateway Connecting
2019-04-09 16:01:51.099 +00:00 [INF] 16:01:51 Gateway System.Net.WebSockets.WebSocketException (2): Unable to connect to the remote server ---> System.Net.Http.HttpRequestException: Connection timed out ---> System.Net.Sockets.SocketException: Connection timed out
at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
at System.Threading.Tasks.ValueTask`1.get_Result()
at System.Net.Http.HttpConnectionPool.CreateConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Threading.Tasks.ValueTask`1.get_Result()
at System.Net.Http.HttpConnectionPool.WaitForCreatedConnectionAsync(ValueTask`1 creationTask)
at System.Threading.Tasks.ValueTask`1.get_Result()
at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.WebSockets.WebSocketHandle.ConnectAsyncCore(Uri uri, CancellationToken cancellationToken, ClientWebSocketOptions options)
at System.Net.WebSockets.WebSocketHandle.ConnectAsyncCore(Uri uri, CancellationToken cancellationToken, ClientWebSocketOptions options)
at System.Net.WebSockets.ClientWebSocket.ConnectAsyncCore(Uri uri, CancellationToken cancellationToken)
at Discord.Net.WebSockets.DefaultWebSocketClient.ConnectInternalAsync(String host)
at Discord.Net.WebSockets.DefaultWebSocketClient.ConnectAsync(String host)
at Discord.API.DiscordSocketApiClient.ConnectInternalAsync()
at Discord.API.DiscordSocketApiClient.ConnectInternalAsync()
at Discord.API.DiscordSocketApiClient.ConnectAsync()
at Discord.WebSocket.DiscordSocketClient.OnConnectingAsync()
at Discord.WebSocket.DiscordSocketClient.OnConnectingAsync()
at Discord.ConnectionManager.ConnectAsync(CancellationTokenSource reconnectCancelToken)
at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
2019-04-09 16:01:51.102 +00:00 [INF] 16:01:51 Gateway Disconnecting
2019-04-09 16:01:51.103 +00:00 [INF] 16:01:51 Gateway Disconnected
2019-04-09 16:01:52.865 +00:00 [INF] 16:01:52 Gateway Connecting
2019-04-09 16:04:05.282 +00:00 [INF] 16:04:05 Gateway System.TimeoutException: The operation has timed out.
at Discord.ConnectionManager.WaitAsync()
at Discord.WebSocket.DiscordSocketClient.OnConnectingAsync()
at Discord.WebSocket.DiscordSocketClient.OnConnectingAsync()
at Discord.ConnectionManager.ConnectAsync(CancellationTokenSource reconnectCancelToken)
at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
2019-04-09 16:04:05.284 +00:00 [INF] 16:04:05 Gateway Disconnecting
2019-04-09 16:04:05.284 +00:00 [INF] 16:04:05 Gateway Disconnected
2019-04-09 16:04:08.845 +00:00 [INF] 16:04:08 Gateway Connecting
2019-04-09 16:05:48.846 +00:00 [INF] 16:05:48 Gateway Disconnecting
2019-04-09 16:05:48.847 +00:00 [INF] 16:05:48 Gateway Disconnected
After the last disconnect it never reconnected. It has done this multiple times now.
my client works fine about 1 day and then I've got this one: 04.23-18:20:57 DiscordBot Gateway Received Dispatch (PRESENCE_UPDATE) 04.23-18:21:00 DiscordBot Gateway Received Dispatch (PRESENCE_UPDATE) 04.23-18:21:16 DiscordBot Gateway Sent Heartbeat 04.23-18:21:16 DiscordBot Gateway Received HeartbeatAck 04.23-18:21:16 DiscordBot Gateway Latency = 234 ms 04.23-18:21:30 DiscordBot Gateway Received Dispatch (PRESENCE_UPDATE) 04.23-18:21:51 DiscordBot Gateway Received Dispatch (PRESENCE_UPDATE) 04.23-18:21:57 DiscordBot Gateway Sent Heartbeat 04.23-18:21:57 DiscordBot Gateway Received HeartbeatAck 04.23-18:21:57 DiscordBot Gateway Latency = 250 ms 04.23-18:22:38 DiscordBot Gateway Sent Heartbeat 04.23-18:22:39 DiscordBot Gateway Heartbeat Stopped 04.23-18:22:39 DiscordBot Gateway
at Discord.ConnectionManager.<>c__DisplayClass29_0.<
04.23-18:22:39 DiscordBot Gateway Disconnecting
04.23-18:22:39 DiscordBot Gateway Disconnecting ApiClient
04.23-18:22:39 DiscordBot Gateway Waiting for heartbeater
04.23-18:22:39 DiscordBot Gateway Waiting for guild downloader
04.23-18:22:39 DiscordBot Gateway Clearing large guild queue
04.23-18:22:39 DiscordBot Gateway Raising virtual GuildUnavailables
04.23-18:22:39 DiscordBot Gateway Disconnected from Bitsum Test
04.23-18:22:39 DiscordBot Gateway Disconnected from Bitsum
04.23-18:22:39 DiscordBot WebSocket connection was closed
04.23-18:22:39 DiscordBot at Discord.ConnectionManager.<>c__DisplayClass29_0.<
@4egod Seems to me like your case is exactly as your last message. You're blocking the gateway which doesn't relate to the problems here where nothing is blocking the gateway.
@NovusTheory Ok. Thanks. Can you give me an advice? What should I do to figure out?
My current theory that may also fix #1052 is the ConnectionManager is not Cancelling the cancellation token that the heartbeat task is using when reconnecting. I may be wrong but I'm showing the same issue on my bot.
[21:59:20 WRN] [null]
System.Exception: Server missed last heartbeat
at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
[21:59:20 INF] Disconnecting
[21:59:20 INF] Disconnected
[21:59:21 INF] Connecting
[21:59:31 WRN] Failed to resume previous session
[21:59:31 INF] Connected
[21:59:33 INF] Ready
[22:02:15 WRN] [null]
System.Exception: Server missed last heartbeat
at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
[22:02:15 INF] Disconnecting
[22:02:15 INF] Disconnected
[22:02:16 INF] Connecting
[22:02:17 INF] Connected
[22:02:17 INF] Resumed previous session
[22:09:09 WRN] [null]
System.Exception: Server missed last heartbeat
at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
I may be missing something in the ConnectionManager where it's cancelling the token that is being used. I've only spent a few minutes on this and can try to help dig deeper into the error connections.
Handling all the edge cases in the reconnect logic is hard - especially when discord outages are concerned, so I doubt we'll ever get all of them. This is definitely something we'll look into though.
I don't disagree with that assertion, so assuming that you guys can't or won't be able to handle all possible edge-cases anytime soon, can we at least have an option to stop the API from reconnecting at all?
Let the API report Disconnected and let me manually reconnect it, should be way easier for everyone involved than to try to hunt down all of the re-connection edge cases and it will be stable as hell.
Indeed, at the moment the only way to reliably reconnect is to kill your whole application. The only other option is memory leaking the Discord client and creating a new one - because you can't safely dispose of it without deadlock ( #1448 )
I have a small bot that I don't mess with very often and that one I restart only once a month and memoryleaking it in the meantime. Effectively that means going from 20GBs of RAM usage to about 45GBs in that one month on a bot that's on 3 Discord servers of < 5000 users total. Can't imagine leaking the big one. That gets booted on every Disconnect.
One potential idea I had to fix this would be to go the route of Wumpus.Net. Wumpus.Net drops the StartAsync/StopAsync system and instead uses a RunAsync method to run the client. This RunAsync method would run the client until an unhandled exception occurs, including reconnecting and resuming where possible, accepting a cancellation token. Then, the StopAsync method would cancel this RunAsync call, or you could pass in your own CancellationToken.
To retain backwards compatibility, StartAsync would likely be rewritten into:
async Task StartAsync()
{
while (true)
{
try
{
await RunAsync(CancellationToken.None);
}
catch (OperationCanceledException)
{
break;
}
catch (Exception)
{
// potentially log here instead of throw
throw;
}
}
}
Going this route gives us more flexibility, and also lets us drop that annoying ConnectionManager class which is frankly impossible to debug IMHO.
For me it happens alot nowadays, a month ago it only did it every other day. Dont know if this is already stated, but somehow when it does this theres multiple instances listening to commands and all of them run the command. Can we alternatively catch the expection and reboot the program? Not sure if this is possible.
Well, the amount of Disconnects is on Discord's own infrastructure and you can't do anything about it. Other than cute graphs ~
(And yes you can subscribe to the Disconnected event inside of which you can Environment.Exit or something.)
About the Environment.Exit, this that something you plan to make a hotfix of? Else I will try and look into it myself.
Yeah this should really be bumped up and prioritized with these stupid changes.
@RheaAyase Thanks for the heads up! I might still see if I can make something myself in the meanwhile tho.
Currently my only thought on this is on the example replace the await Task.Delay(Timeout.Infinite)
line in the example using DI for commands and a command service with something like this instead might fix it for me. As I currently get this after a few hours starting my webapi where I cannot afford to reboot the whole process every time it fails so I am trying this now:
var connTime = 0;
while (true)
{
if (client.ConnectionState == ConnectionState.Disconnected)
{
connTime++;
if (connTime.Equals(30))
{
// if only we could call the log event method ourselves without compile error.
await client.Log.Invoke(new LogMessage(LogSeverity.Info, "Gateway", "Has not been connected for 30 seconds. Assuming deadlock in connector.")).ConfigureAwait(false);
await client.StartAsync().ConfigureAwait(false);
}
}
else if (client.ConnectionState == ConnectionState.Connected && !connTime.Equals(0))
{
connTime = 0;
}
await Task.Delay(1000).ConfigureAwait(false);
}
Also another idea, perhaps I could have my bot use a branch of the code, find the logic behind reconnections then use the logger of d.net to debug the code (using line numbers as well pasted in to the thing to find where the deadlock is). Since I now write all the contents ever logged to a file named Discord.log. However here is another issue though, making the discord.net log level to be detailed on the event method so I can see all logs that the library can make on the file.
Alright so I am adding a logger to ConnectionManager logging all lines until the deadlock happens so that can identify where the deadlock really is (with the help of the _logger member of that class)
Also on first look I personally think this needs changed though (because I think that for normal "errors" that are deamed not "critical" could be causing this too and possibly not a deadlock because the cancelation token is not canceled?
public void Error(Exception ex)
{
_readyPromise.TrySetException(ex);
_connectionPromise.TrySetException(ex);
_connectionCancelToken?.Cancel();
+ _reconnectCancelToken?.Cancel();
}
public void CriticalError(Exception ex)
{
- _reconnectCancelToken?.Cancel();
Error(ex);
}
While this might not have anything to do with it I think it might be causing the task to identify as completed or something. And that is if it is the real issue that would be the simplest fix, if not then that means the issue might be somewhere in the ApiClient class that is then used for opcodes and stuff in the regular SocketClient.
However problems like this I think deserves it's own log event that is used merely for debugging purposes only.
I also now notice what happens when it does this, it reconnects, but that it fails to send the ready (because GuildAvailible blocks the gateway) and then disconnects from an timeout error. Lets see what the log shows:
Even though the bot is in only a single guild it seems that the A GuildAvailable handler is blocking the gateway task.
could be where the deadlocks like that breaks the connector and spams trying to reconnect but then immediately disconnects when that handler
causes the connector to time out.
I have found this by setting the log level in Discord.Net to Debug.
Thanks to the pull request for the ratelimits it seems that now it does try to recover and eventually becomes to the state where everything is usable again and works.
However I think the GuildAvailable
Handlers needs a looking into further, if it was to get the blocking issue fixed maybe the TimeoutExceptions could disappear and the whole issue with holes of the bot being an "ghost" and showing as offline (often times 10 minutes to a few hours) after just being online for a total of 2 hours might fix this issue and the other issue that looks to be a duplicate of this one as well #1364.
After futher testing from swallowing the timeout exception from here https://github.com/discord-net/Discord.Net/blob/dev/src/Discord.Net.WebSocket/ConnectionManager.cs#L139 on https://github.com/discord-net/Discord.Net/blob/dev/src/Discord.Net.WebSocket/ConnectionManager.cs#L142 (instead of what L142 currently is on the github ofc) that all is working properly now. I suspect that the timeout values is not properly calculated and seems to be too short. However I think commenting out L139 or removing is also safe to do as well.
When I done this I have not experienced this issue in now over 2 hours. But I will report back when I find anything different. If I do not report back assume this change fixed this issue (and #1364).
Alright so I have discovered that the last line that it logs before it deadlocks in ConnectionManager.cs is this one https://github.com/discord-net/Discord.Net/blob/dev/src/Discord.Net.WebSocket/ConnectionManager.cs#L190
In a way I do not see what would actually cause a deadlock like this as such I might have to do the same logging on every line of code that can be hit (even if and else blocks) to see if the deadlock is in DiscordSocketClient or in DiscordSocketApiClient. It could be hiding in 1 of those classes and we not even know it.
@SubZero0 think I should add logging on every lines on those 2 files then pr your branch on your fork that goes into your pr that it builds off of? Maybe having this kind of logging can help prevent future issues with the gateway code that is in the library.
@AraHaan Not sure if I understood but all builds are made from the master branch in this repo, my fork is what I use to write my changes (or fixes or whatever) to then PR them here. The only custom branch there is the one I use for my own bot that I haven't had issues with the connection deadlock in years (and no, there's no changes to the ConnectionManager), so it shouldn't really help finding it.
Library properly attempted to reconnect after missing last heartbeat, but it seems like it failed with timeout after those ~45 seconds and decided to stop further attempts. According to https://github.com/RogueException/Discord.Net/commit/73ac9d7886aa48b9d809c56e51945056f3b67232 - similar issue should be already solved for
2.0.0-beta
, but it seems that this issue is still recent in one form or another.It should definitely keep on trying until it succeeds, unless this is somehow intended (in this case how we should handle it ourselves?)
Thank you in advance.