Closed kuru-rin closed 2 years ago
What docker base are you running the bot in?
cc @lhjt
They are running on a k8s cluster with .NET 6.
I am running sharded discord client with 8 shards per process.
What docker base are you running the bot in?
mcr.microsoft.com/dotnet/aspnet:6.0.0-alpine3.14
Are you running your services behind any sort of service mesh, such as Istio?
Nope. I have no sidecars at all in the pod.
Have you experienced any other connectivity issues with deployments on your cluster?
None that I have witnessed.
Is each process running in its own pod?
Yes. Within that process is the sharded client with 8 shards at the moment. However, I have had the same issue using non-sharded client with a single shard in a process per pod.
If so, do all of the processes encounter this issue at the same time?
They do not.
For example, the following image is from after 28h of running them.
The number of restarts is different for each pod, but they all seem to have been killed with a similar disconnection log as the one I posted.
When you experience this error, do all of the shards on the same process also encounter this error or is it just the one shard?
I assume it's just one shard, since I just see a single Disconnecting
log.
How can I find out from which shard a log message originates? I am currently just hooking in on the Log
event and passing it through to the ILogger
and its corresponding severity, but I see no shard or anything in the log message.
I think it may have something to do with the voice channels, since at the same time a FFmpeg process streaming to the AudioOutStream
also seems to die with: Error writing trailer of pipe:: Broken pipe
.
What are your process configurations for starting FFMPEG, in my experience with ffmpeg I get that error of a broken pipe but my bot doesn't close
This is quite a peculiar problem, but based on this information it doesn't seem like it would be related to your deployment environment.
Do you happen to have any other action logs produced by your bot just before the disconnect and program exit happens? If not, would you be able to change the logging level to debug and continue running your workload until you run into this issue again, and then attach the logs before the exception happens? This may help us in producing an example where this issue can be reliably reproduced.
What are your process configurations for starting FFMPEG, in my experience with ffmpeg I get that error of a broken pipe but my bot doesn't close
I suppose the broken pipe could just be an after-affect of the socket being aborted. My ffmpeg process comes down to this:
using var ffmpeg = Process.Start(new ProcessStartInfo
{
FileName = "ffmpeg",
Arguments = $"-hide_banner -i {inputFilePath} -ac 2 -f s16le -ar 48000 -",
UseShellExecute = false,
RedirectStandardInput = false,
RedirectStandardOutput = true
});
await using var output = ffmpeg.StandardOutput.BaseStream;
try
{
await output.CopyToAsync(audioStream, 3840, cancellationToken);
}
finally
{
await audioStream.FlushAsync(cancellationToken);
}
If this was related I would expect this problem to happen much more often, since this process runs many more times per day than the issue occurs.
This is quite a peculiar problem, but based on this information it doesn't seem like it would be related to your deployment environment.
Indeed, I'm really at a loss as to what it can be.
Do you happen to have any other action logs produced by your bot just before the disconnect and program exit happens? If not, would you be able to change the logging level to debug and continue running your workload until you run into this issue again, and then attach the logs before the exception happens? This may help us in producing an example where this issue can be reliably reproduced.
At the moment sadly I do not. I already added the first chance exception logging for more information, but it did not help much.
I will see if I can add even more logging and enable the debug logs and I will reply here again once I get some more data. It may take a few days though.
I found this other similar issue: https://github.com/Discord-Net-Labs/Discord.Net-Labs/issues/203 Could it have something to do with this?
Can you test if dragging your bot to a different vc crashes the pod?
Can you test if dragging your bot to a different vc crashes the pod?
Yeah, I tried that, but it does not make it crash.
I figured maybe it could be something similar, but perhaps an unhandled exception at the Discord.NET end? That would probably make the app exit.
@lhjt I was able to get some debug logs already. This is from the last second before one of the pods crashed:
```
[09:50:13 DBG] POST channels/xxx/messages: 304.38 ms
[09:50:13 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE)
[09:50:13 DBG] Received Dispatch (GUILD_MEMBER_UPDATE)
[09:50:13 DBG] Malformed Frame
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (CHANNEL_UPDATE)
[09:50:13 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[09:50:13 DBG] Sent RequestGuildMembers
[09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (GUILD_MEMBERS_CHUNK)
[09:50:13 INF] Updated 25 guild users for guild xxx
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Sent Heartbeat
[09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] POST channels/xxx/messages: 402.92 ms
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Latency = 100 ms
[09:50:13 DBG] Received HeartbeatAck
[09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (GUILD_MEMBER_UPDATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[09:50:13 DBG] Received Dispatch (GUILD_MEMBER_UPDATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 DBG] Received Dispatch (MESSAGE_CREATE)
[09:50:13 INF] First chance exception occurred at application level: A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
[09:50:13 INF] First chance exception occurred at application level: A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
[09:50:13 INF] First chance exception occurred at application level: A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
[09:50:13 DBG] Keepalive Stopped
[09:50:13 DBG] Heartbeat Stopped
[09:50:13 WRN] [null]
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at Discord.ConnectionManager.<>c__DisplayClass29_0.<
```
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Malformed Frame
[11:03:02 DBG] Received Dispatch (VOICE_STATE_UPDATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_UPDATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (GUILD_MEMBER_UPDATE)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_UPDATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (MESSAGE_CREATE)
[11:03:02 DBG] Received Dispatch (GUILD_MEMBER_UPDATE)
[11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE)
[11:03:02 INF] First chance exception occurred at application level: The operation was canceled.
System.OperationCanceledException: The operation was canceled.
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
[11:03:02 INF] First chance exception occurred at application level: A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
[11:03:02 INF] First chance exception occurred at application level: A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
[11:03:02 INF] First chance exception occurred at application level: A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
[11:03:02 DBG] Keepalive Stopped
[11:03:02 DBG] Heartbeat Stopped
[11:03:02 WRN] [null]
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at Discord.ConnectionManager.<>c__DisplayClass29_0.<
I did remove some unrelated logs, otherwise it would be too massive.
Hope this helps. Thanks so much!
The only Discord.Net trace I see here is the Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext()
. The connection manager is used for audio sockets and the gateway socket so I'm unable to tell if this is an audio related issue or gateway issue.
Would you be able to directly add the project source to your bot? This should give you a full stack trace including the line numbers for dnet, this will help me figure out the root cause.
Just for reference, here is the StartAsync function
Would you be able to directly add the project source to your bot? This should give you a full stack trace including the line numbers for dnet, this will help me figure out the root cause.
Sure, here you go:
```
[08:23:05 DBG] Received HeartbeatAck
[08:23:05 DBG] Latency = 200 ms
[08:23:05 DBG] Received Dispatch (MESSAGE_REACTION_ADD)
[08:23:05 DBG] Received Dispatch (GUILD_MEMBER_UPDATE)
[08:23:05 DBG] Malformed Frame
[08:23:05 DBG] Received Dispatch (MESSAGE_CREATE)
[08:23:05 DBG] Received Dispatch (MESSAGE_UPDATE)
[08:23:05 DBG] First chance exception occurred at application level: A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
[08:23:05 DBG] First chance exception occurred at application level: A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
[08:23:05 DBG] Heartbeat Stopped
[08:23:05 DBG] First chance exception occurred at application level: A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
[08:23:05 DBG] Keepalive Stopped
[08:23:05 WRN] [null]
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at Discord.ConnectionManager.<>c__DisplayClass29_0.<
Does this error occur running the same application outside of docker? Are you able to run one process with 8 shards outside of docker for a day and see if you get the same logs? I've heard of people having issues with k8s/docker and websockets so it might be something there.
Looking at the one stack we get from the logs that exception is handled properly. One piece of evidence to back this up is the continuation of code executing by the other logs below it.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext() in /src/discord-net/Discord.Net.WebSocket/ConnectionManager.cs:line 79
[08:23:05 INF] Disconnecting // here
[08:23:05 DBG] Disconnecting ApiClient // here
Maybe it has to do with the [08:23:05 DBG] Sent VoiceStateUpdate
update message. Maybe its trying to send the voice state even though the socket is closed? I'll have a look into this portion of the library.
Does this error occur running the same application outside of docker? Are you able to run one process with 8 shards outside of docker for a day and see if you get the same logs? I've heard of people having issues with k8s/docker and websockets so it might be something there.
I'm afraid it's not possible for me to run it elsewhere, since there is shared infrastructure within the cluster (such as RabbitMQ and Redis) and nothing is reachable from the outside.
I have been checking Discord.NET code a bit, and been trying to trace what is happening...
So it seems it is Disconnecting but before the OnDisconnected
event fires it dies.
I have a log on the Disconnected
event so I would expect it to show up in the log, but it does not.
private async Task DisconnectAsync(Exception ex, bool isReconnecting)
{
if (State == ConnectionState.Disconnected) return;
State = ConnectionState.Disconnecting;
await _logger.InfoAsync("Disconnecting").ConfigureAwait(false);
await _onDisconnecting(ex).ConfigureAwait(false);
await _disconnectedEvent.InvokeAsync(ex, isReconnecting).ConfigureAwait(false); // it never gets here ?
State = ConnectionState.Disconnected;
await _logger.InfoAsync("Disconnected").ConfigureAwait(false);
}
So I am assuming an exception is thrown within the DiscordSocketClient.OnDisconnectingAsync(...)
somewhere.
I will throw some try blocks in the Discord.NET code that I added to my project to see if I get some more output.
Feel free to experiment but I think it might be related to this pr. I'll make a change to how it works and get you to try it out.
I think I'm already seeing some results. Caught this exception, which was otherwise uncaught:
System.ObjectDisposedException: The semaphore has been disposed.
at System.Threading.SemaphoreSlim.Release(Int32 releaseCount)
at Discord.Audio.DiscordVoiceAPIClient.DisconnectAsync() in /src/discord-net/Discord.Net.WebSocket/DiscordVoiceApiClient.cs:line 208
at Discord.Audio.AudioClient.OnDisconnectingAsync(Exception ex) in /src/discord-net/Discord.Net.WebSocket/Audio/AudioClient.cs:line 127
at Discord.ConnectionManager.DisconnectAsync(Exception ex, Boolean isReconnecting) in /src/discord-net/Discord.Net.WebSocket/ConnectionManager.cs:line 189
at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext() in /src/discord-net/Discord.Net.WebSocket/ConnectionManager.cs:line 94
Does this give you any ideas?
Does catching that exception stop the pod from restarting?
Does catching that exception stop the pod from restarting?
Well, I have only had it running for 37m so far, but no restarts yet at all so it seems like it. I will let it run for a whole day at least to be 100% sure.
Edit: I was just a minute too soon with this message, as one restarted just now with the same issue as before.
```
[17:41:22 DBG] First chance exception occurred at application level: A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
[17:41:22 DBG] First chance exception occurred at application level: A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
[17:41:22 DBG] Keepalive Stopped
[17:41:22 DBG] First chance exception occurred at application level: A task was canceled.
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
[17:41:22 DBG] Heartbeat Stopped
[17:41:22 WRN] [null]
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at Discord.ConnectionManager.<>c__DisplayClass29_0.<
I see that Sent VoiceStateUpdate
you were talking about though, so perhaps there is more than one issue causing the restarts.
Does using .NET 5 change anything?
Does using .NET 5 change anything?
That's not something I can downgrade to quickly and easily, so it will be a last resort test I can do.
However if you're talking mainly about the BackgroundServiceExceptionBehavior
that has changed in .NET 6,
I have that set to Ignore
, so it should be the same as .NET 5 in my app.
For now I will continue to find places with unhandled exceptions in discord-net code and add try-catches and logging where I can. The code is rather hard to follow, so I hope I can keep my sanity.
So far I have traced it to a Segmentation fault (exit code 139).
I am assuming it is caused by either ffmpeg, libopus, or maybe libsodium? More likely libopus/libsodium since discord-net is using it directly, it would make sense that the app would crash. Since ffmpeg is run as an external process it should not crash the .NET process as far as I know.
Things I have done that did not help:
mcr.microsoft.com/dotnet/aspnet:6.0.1-bullseye-slim
libopus
is latest version 1.3.1
libsodium
is latest version 1.0.18
FFmpeg version seems recent enough too:
ffmpeg version 4.3.3-0+deb11u1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 10 (Debian 10.2.1-6)
configuration: --prefix=/usr --extra-version=0+deb11u1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
libpostproc 55. 7.100 / 55. 7.100
I don't know anything about how this all works in discord-net, so it would be great if someone could give their insights in what the problem could be.
I have chosen to circumvent the issue by using Lavalink instead. My bot has now been running for many hours without any restarts, so it is safe to say the issue is gone.
It might be good to warn people that the Discord.NET audio client can kill your app with segmentation faults. Perhaps it's only when using it in Linux containers, but it's hard to say for sure.
Check The Docs
Verify Issue Source
Check your intents
Description
My application seems to exit completely following a
NULL
logged as warning and a disconnection from Discord.NET. (see the stacktrace)I'm not sure how it happens, since it happens very randomly (2 to 10 times per 24 hours with my production workload). This makes me believe it has something to do with Discord and the users doing stuff with my bot.
I am running sharded discord client with 8 shards per process. Running 6 processes for a total of 48 shards at the moment. However I had the same happening with 1 process per shard using the regular discord client. They are running on a k8s cluster with .NET 6.
Some things I have done to try and solve this with no success:
Task.Run(...)
so they do not block the gateway.I hope someone is able to help me trace the problem and work around it.
Version
Currently 3.0.0 and also had this problem in 3.5.0-labs
Working Version
Not exactly sure when it started
Logs
I have been logging first chance exceptions as well to hopefully find the issue, although these might be unrelated.
Perhaps the
The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived'
is where things go wrong, but I know too little of Discord.NET and the API to say anything about this.Log Excerpt
``` [16:04:12 WRN] [null] System.Threading.Tasks.TaskCanceledException: A task was canceled. at Discord.ConnectionManager.<>c__DisplayClass29_0.<Sample
I have not been able to reproduce the issue.