Discord-Net-Labs / Discord.Net-Labs

An experimental fork of Discord.Net that implements the newest discord features for testing and development to eventually get merged into Discord.Net
https://labs.discordnet.dev
MIT License
156 stars 42 forks source link

[Bug]: Disconnect happening randomly and exiting application, WebSocket in aborted state #402

Closed kuru-rin closed 2 years ago

kuru-rin commented 2 years ago

Check The Docs

Verify Issue Source

Check your intents

Description

My application seems to exit completely following a NULL logged as warning and a disconnection from Discord.NET. (see the stacktrace)

I'm not sure how it happens, since it happens very randomly (2 to 10 times per 24 hours with my production workload). This makes me believe it has something to do with Discord and the users doing stuff with my bot.

I am running sharded discord client with 8 shards per process. Running 6 processes for a total of 48 shards at the moment. However I had the same happening with 1 process per shard using the regular discord client. They are running on a k8s cluster with .NET 6.

Some things I have done to try and solve this with no success:

I hope someone is able to help me trace the problem and work around it.

Version

Currently 3.0.0 and also had this problem in 3.5.0-labs

Working Version

Not exactly sure when it started

Logs

I have been logging first chance exceptions as well to hopefully find the issue, although these might be unrelated.

Perhaps the The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' is where things go wrong, but I know too little of Discord.NET and the API to say anything about this.

Log Excerpt ``` [16:04:12 WRN] [null] System.Threading.Tasks.TaskCanceledException: A task was canceled. at Discord.ConnectionManager.<>c__DisplayClass29_0.<b__0>d.MoveNext() [16:04:12 INF] Disconnecting [16:04:12 INF] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) [16:04:12 INF] First chance exception occurred at application level: The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' System.Net.WebSockets.WebSocketException (0x80004005): The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' at System.Net.WebSockets.WebSocketValidate.ThrowIfInvalidState(WebSocketState currentState, Boolean isDisposed, WebSocketState[] validStates) [16:04:12 INF] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [16:04:12 INF] First chance exception occurred at application level: The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' System.Net.WebSockets.WebSocketException (0x80004005): The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' at System.Net.WebSockets.WebSocketValidate.ThrowIfInvalidState(WebSocketState currentState, Boolean isDisposed, WebSocketState[] validStates) at System.Net.WebSockets.ManagedWebSocket.CloseOutputAsyncCore(WebSocketCloseStatus closeStatus, String statusDescription, CancellationToken cancellationToken) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [16:04:12 INF] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) [16:04:12 INF] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [16:04:12 INF] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [16:04:12 INF] First chance exception occurred at application level: The operation was canceled. System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination) at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken)[16:04:12 INF] First chance exception occurred at application level: The operation was canceled. System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination) at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [16:04:12 INF] First chance exception occurred at application level: The operation was canceled. System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination) at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) at System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32 minimumRequiredBytes, CancellationToken cancellationToken, Boolean throwOnPrematureClosure) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [16:04:12 INF] First chance exception occurred at application level: The operation was canceled. System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination) at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) at System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32 minimumRequiredBytes, CancellationToken cancellationToken, Boolean throwOnPrematureClosure) at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate[TResult](Memory`1 payloadBuffer, CancellationToken cancellationToken) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [16:04:12 INF] First chance exception occurred at application level: The operation was canceled. System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination) at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) at System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32 minimumRequiredBytes, CancellationToken cancellationToken, Boolean throwOnPrematureClosure) at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate[TResult](Memory`1 payloadBuffer, CancellationToken cancellationToken) at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Threading.Tasks.ValueTask`1.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state) --- End of stack trace from previous location --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [16:04:13 INF] Disconnected [16:04:13 INF] First chance exception occurred at application level: Operation canceled System.Net.Sockets.SocketException (125): Operation canceled at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) [16:04:13 INF] First chance exception occurred at application level: Operation canceled System.Net.Sockets.SocketException (125): Operation canceled at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Threading.Tasks.ValueTask`1.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state) --- End of stack trace from previous location --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() ```

Sample

I have not been able to reproduce the issue.

quinchs commented 2 years ago

What docker base are you running the bot in?

quinchs commented 2 years ago

cc @lhjt

lhjt commented 2 years ago

They are running on a k8s cluster with .NET 6.

I am running sharded discord client with 8 shards per process.

kuru-rin commented 2 years ago

What docker base are you running the bot in?

mcr.microsoft.com/dotnet/aspnet:6.0.0-alpine3.14

Are you running your services behind any sort of service mesh, such as Istio?

Nope. I have no sidecars at all in the pod.

Have you experienced any other connectivity issues with deployments on your cluster?

None that I have witnessed.

Is each process running in its own pod?

Yes. Within that process is the sharded client with 8 shards at the moment. However, I have had the same issue using non-sharded client with a single shard in a process per pod.

If so, do all of the processes encounter this issue at the same time?

They do not.

For example, the following image is from after 28h of running them. image

The number of restarts is different for each pod, but they all seem to have been killed with a similar disconnection log as the one I posted.

When you experience this error, do all of the shards on the same process also encounter this error or is it just the one shard?

I assume it's just one shard, since I just see a single Disconnecting log.

How can I find out from which shard a log message originates? I am currently just hooking in on the Log event and passing it through to the ILogger and its corresponding severity, but I see no shard or anything in the log message.


I think it may have something to do with the voice channels, since at the same time a FFmpeg process streaming to the AudioOutStream also seems to die with: Error writing trailer of pipe:: Broken pipe.

quinchs commented 2 years ago

What are your process configurations for starting FFMPEG, in my experience with ffmpeg I get that error of a broken pipe but my bot doesn't close

lhjt commented 2 years ago

This is quite a peculiar problem, but based on this information it doesn't seem like it would be related to your deployment environment.

Do you happen to have any other action logs produced by your bot just before the disconnect and program exit happens? If not, would you be able to change the logging level to debug and continue running your workload until you run into this issue again, and then attach the logs before the exception happens? This may help us in producing an example where this issue can be reliably reproduced.

kuru-rin commented 2 years ago

What are your process configurations for starting FFMPEG, in my experience with ffmpeg I get that error of a broken pipe but my bot doesn't close

I suppose the broken pipe could just be an after-affect of the socket being aborted. My ffmpeg process comes down to this:

using var ffmpeg = Process.Start(new ProcessStartInfo
{
    FileName = "ffmpeg",
    Arguments = $"-hide_banner -i {inputFilePath} -ac 2 -f s16le -ar 48000 -",
    UseShellExecute = false,
    RedirectStandardInput = false,
    RedirectStandardOutput = true
});

await using var output = ffmpeg.StandardOutput.BaseStream;

try
{
    await output.CopyToAsync(audioStream, 3840, cancellationToken);
}
finally
{
    await audioStream.FlushAsync(cancellationToken);
}

If this was related I would expect this problem to happen much more often, since this process runs many more times per day than the issue occurs.

This is quite a peculiar problem, but based on this information it doesn't seem like it would be related to your deployment environment.

Indeed, I'm really at a loss as to what it can be.

Do you happen to have any other action logs produced by your bot just before the disconnect and program exit happens? If not, would you be able to change the logging level to debug and continue running your workload until you run into this issue again, and then attach the logs before the exception happens? This may help us in producing an example where this issue can be reliably reproduced.

At the moment sadly I do not. I already added the first chance exception logging for more information, but it did not help much.

I will see if I can add even more logging and enable the debug logs and I will reply here again once I get some more data. It may take a few days though.


I found this other similar issue: https://github.com/Discord-Net-Labs/Discord.Net-Labs/issues/203 Could it have something to do with this?

quinchs commented 2 years ago

Can you test if dragging your bot to a different vc crashes the pod?

kuru-rin commented 2 years ago

Can you test if dragging your bot to a different vc crashes the pod?

Yeah, I tried that, but it does not make it crash.

I figured maybe it could be something similar, but perhaps an unhandled exception at the Discord.NET end? That would probably make the app exit.

kuru-rin commented 2 years ago

@lhjt I was able to get some debug logs already. This is from the last second before one of the pods crashed:

First pod crash log

``` [09:50:13 DBG] POST channels/xxx/messages: 304.38 ms [09:50:13 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE) [09:50:13 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE) [09:50:13 DBG] Received Dispatch (GUILD_MEMBER_UPDATE) [09:50:13 DBG] Malformed Frame [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (CHANNEL_UPDATE) [09:50:13 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [09:50:13 DBG] Sent RequestGuildMembers [09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE) [09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (GUILD_MEMBERS_CHUNK) [09:50:13 INF] Updated 25 guild users for guild xxx [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Sent Heartbeat [09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] POST channels/xxx/messages: 402.92 ms [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Latency = 100 ms [09:50:13 DBG] Received HeartbeatAck [09:50:13 DBG] Received Dispatch (MESSAGE_UPDATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (GUILD_MEMBER_UPDATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [09:50:13 DBG] Received Dispatch (GUILD_MEMBER_UPDATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 DBG] Received Dispatch (MESSAGE_CREATE) [09:50:13 INF] First chance exception occurred at application level: A task was canceled. System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [09:50:13 INF] First chance exception occurred at application level: A task was canceled. System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [09:50:13 INF] First chance exception occurred at application level: A task was canceled. System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [09:50:13 DBG] Keepalive Stopped [09:50:13 DBG] Heartbeat Stopped [09:50:13 WRN] [null] System.Threading.Tasks.TaskCanceledException: A task was canceled. at Discord.ConnectionManager.<>c__DisplayClass29_0.<b__0>d.MoveNext() [09:50:13 INF] Disconnecting [09:50:13 DBG] Disconnecting ApiClient [09:50:13 INF] First chance exception occurred at application level: The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' System.Net.WebSockets.WebSocketException (0x80004005): The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' at System.Net.WebSockets.WebSocketValidate.ThrowIfInvalidState(WebSocketState currentState, Boolean isDisposed, WebSocketState[] validStates) [09:50:13 INF] First chance exception occurred at application level: Operation canceled System.Net.Sockets.SocketException (125): Operation canceled at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) [09:50:13 INF] First chance exception occurred at application level: The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' System.Net.WebSockets.WebSocketException (0x80004005): The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' at System.Net.WebSockets.WebSocketValidate.ThrowIfInvalidState(WebSocketState currentState, Boolean isDisposed, WebSocketState[] validStates) at System.Net.WebSockets.ManagedWebSocket.CloseOutputAsyncCore(WebSocketCloseStatus closeStatus, String statusDescription, CancellationToken cancellationToken) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [09:50:13 INF] First chance exception occurred at application level: Operation canceled System.Net.Sockets.SocketException (125): Operation canceled at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Threading.Tasks.ValueTask`1.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state) --- End of stack trace from previous location --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [09:50:13 DBG] Sent VoiceStateUpdate av_interleaved_write_frame(): Broken pipe 64.5kbits/s speed=33.1x Error writing trailer of pipe:: Broken pipe size= 1533kB time=00:03:05.13 bitrate= 67.8kbits/s speed= 21x video:0kB audio:1537kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Conversion failed! ```

Second pod crash log

``` [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Malformed Frame [11:03:02 DBG] Received Dispatch (VOICE_STATE_UPDATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_UPDATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (GUILD_MEMBER_UPDATE) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_UPDATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (MESSAGE_CREATE) [11:03:02 DBG] Received Dispatch (GUILD_MEMBER_UPDATE) [11:03:02 DBG] Received Dispatch (CHANNEL_UPDATE) [11:03:02 INF] First chance exception occurred at application level: The operation was canceled. System.OperationCanceledException: The operation was canceled. at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [11:03:02 INF] First chance exception occurred at application level: A task was canceled. System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [11:03:02 INF] First chance exception occurred at application level: A task was canceled. System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [11:03:02 INF] First chance exception occurred at application level: A task was canceled. System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [11:03:02 DBG] Keepalive Stopped [11:03:02 DBG] Heartbeat Stopped [11:03:02 WRN] [null] System.Threading.Tasks.TaskCanceledException: A task was canceled. at Discord.ConnectionManager.<>c__DisplayClass29_0.<b__0>d.MoveNext() [11:03:02 INF] Disconnecting [11:03:02 DBG] Disconnecting ApiClient [11:03:02 INF] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) [11:03:02 INF] First chance exception occurred at application level: The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' System.Net.WebSockets.WebSocketException (0x80004005): The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' at System.Net.WebSockets.WebSocketValidate.ThrowIfInvalidState(WebSocketState currentState, Boolean isDisposed, WebSocketState[] validStates) ```

I did remove some unrelated logs, otherwise it would be too massive.

Hope this helps. Thanks so much!

quinchs commented 2 years ago

The only Discord.Net trace I see here is the Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext(). The connection manager is used for audio sockets and the gateway socket so I'm unable to tell if this is an audio related issue or gateway issue.

Would you be able to directly add the project source to your bot? This should give you a full stack trace including the line numbers for dnet, this will help me figure out the root cause.

Just for reference, here is the StartAsync function

kuru-rin commented 2 years ago

Would you be able to directly add the project source to your bot? This should give you a full stack trace including the line numbers for dnet, this will help me figure out the root cause.

Sure, here you go:

Log with discord-net 3.1.0 source added

``` [08:23:05 DBG] Received HeartbeatAck [08:23:05 DBG] Latency = 200 ms [08:23:05 DBG] Received Dispatch (MESSAGE_REACTION_ADD) [08:23:05 DBG] Received Dispatch (GUILD_MEMBER_UPDATE) [08:23:05 DBG] Malformed Frame [08:23:05 DBG] Received Dispatch (MESSAGE_CREATE) [08:23:05 DBG] Received Dispatch (MESSAGE_UPDATE) [08:23:05 DBG] First chance exception occurred at application level: A task was canceled. System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [08:23:05 DBG] First chance exception occurred at application level: A task was canceled. System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [08:23:05 DBG] Heartbeat Stopped [08:23:05 DBG] First chance exception occurred at application level: A task was canceled. System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [08:23:05 DBG] Keepalive Stopped [08:23:05 WRN] [null] System.Threading.Tasks.TaskCanceledException: A task was canceled. at Discord.ConnectionManager.<>c__DisplayClass29_0.<b__0>d.MoveNext() in /src/discord-net/Discord.Net.WebSocket/ConnectionManager.cs:line 79 [08:23:05 INF] Disconnecting [08:23:05 DBG] Disconnecting ApiClient [08:23:05 DBG] First chance exception occurred at application level: The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' System.Net.WebSockets.WebSocketException (0x80004005): The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' at System.Net.WebSockets.WebSocketValidate.ThrowIfInvalidState(WebSocketState currentState, Boolean isDisposed, WebSocketState[] validStates) [08:23:05 DBG] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) [08:23:05 DBG] First chance exception occurred at application level: The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' System.Net.WebSockets.WebSocketException (0x80004005): The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' at System.Net.WebSockets.WebSocketValidate.ThrowIfInvalidState(WebSocketState currentState, Boolean isDisposed, WebSocketState[] validStates) at System.Net.WebSockets.ManagedWebSocket.CloseOutputAsyncCore(WebSocketCloseStatus closeStatus, String statusDescription, CancellationToken cancellationToken) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [08:23:05 DBG] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [08:23:05 DBG] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) [08:23:05 DBG] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [08:23:05 DBG] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [08:23:05 DBG] First chance exception occurred at application level: The operation was canceled. System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination) at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) [08:23:05 DBG] First chance exception occurred at application level: The operation was canceled. System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination) at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [08:23:05 DBG] Sent VoiceStateUpdate av_interleaved_write_frame(): Broken pipe1536.9kbits/s speed=1.11x Error writing trailer of pipe:: Broken pipe size= 6325kB time=00:00:33.73 bitrate=1536.0kbits/s speed=1.02x video:0kB audio:6329kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Conversion failed! ```

quinchs commented 2 years ago

Does this error occur running the same application outside of docker? Are you able to run one process with 8 shards outside of docker for a day and see if you get the same logs? I've heard of people having issues with k8s/docker and websockets so it might be something there.

Looking at the one stack we get from the logs that exception is handled properly. One piece of evidence to back this up is the continuation of code executing by the other logs below it.

System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext() in /src/discord-net/Discord.Net.WebSocket/ConnectionManager.cs:line 79
[08:23:05 INF] Disconnecting // here
[08:23:05 DBG] Disconnecting ApiClient // here

Maybe it has to do with the [08:23:05 DBG] Sent VoiceStateUpdate update message. Maybe its trying to send the voice state even though the socket is closed? I'll have a look into this portion of the library.

kuru-rin commented 2 years ago

Does this error occur running the same application outside of docker? Are you able to run one process with 8 shards outside of docker for a day and see if you get the same logs? I've heard of people having issues with k8s/docker and websockets so it might be something there.

I'm afraid it's not possible for me to run it elsewhere, since there is shared infrastructure within the cluster (such as RabbitMQ and Redis) and nothing is reachable from the outside.


I have been checking Discord.NET code a bit, and been trying to trace what is happening...

So it seems it is Disconnecting but before the OnDisconnected event fires it dies. I have a log on the Disconnected event so I would expect it to show up in the log, but it does not.

private async Task DisconnectAsync(Exception ex, bool isReconnecting)
{
    if (State == ConnectionState.Disconnected) return;
    State = ConnectionState.Disconnecting;
    await _logger.InfoAsync("Disconnecting").ConfigureAwait(false);

    await _onDisconnecting(ex).ConfigureAwait(false);

    await _disconnectedEvent.InvokeAsync(ex, isReconnecting).ConfigureAwait(false);  // it never gets here ?
    State = ConnectionState.Disconnected;
    await _logger.InfoAsync("Disconnected").ConfigureAwait(false);
}

So I am assuming an exception is thrown within the DiscordSocketClient.OnDisconnectingAsync(...) somewhere.

I will throw some try blocks in the Discord.NET code that I added to my project to see if I get some more output.

quinchs commented 2 years ago

Feel free to experiment but I think it might be related to this pr. I'll make a change to how it works and get you to try it out.

kuru-rin commented 2 years ago

I think I'm already seeing some results. Caught this exception, which was otherwise uncaught:

System.ObjectDisposedException: The semaphore has been disposed.
   at System.Threading.SemaphoreSlim.Release(Int32 releaseCount)
   at Discord.Audio.DiscordVoiceAPIClient.DisconnectAsync() in /src/discord-net/Discord.Net.WebSocket/DiscordVoiceApiClient.cs:line 208
   at Discord.Audio.AudioClient.OnDisconnectingAsync(Exception ex) in /src/discord-net/Discord.Net.WebSocket/Audio/AudioClient.cs:line 127
   at Discord.ConnectionManager.DisconnectAsync(Exception ex, Boolean isReconnecting) in /src/discord-net/Discord.Net.WebSocket/ConnectionManager.cs:line 189
   at Discord.ConnectionManager.<>c__DisplayClass29_0.<<StartAsync>b__0>d.MoveNext() in /src/discord-net/Discord.Net.WebSocket/ConnectionManager.cs:line 94

Does this give you any ideas?

quinchs commented 2 years ago

Does catching that exception stop the pod from restarting?

kuru-rin commented 2 years ago

Does catching that exception stop the pod from restarting?

Well, I have only had it running for 37m so far, but no restarts yet at all so it seems like it. I will let it run for a whole day at least to be 100% sure.

Edit: I was just a minute too soon with this message, as one restarted just now with the same issue as before.

Log

``` [17:41:22 DBG] First chance exception occurred at application level: A task was canceled. System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [17:41:22 DBG] First chance exception occurred at application level: A task was canceled. System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [17:41:22 DBG] Keepalive Stopped [17:41:22 DBG] First chance exception occurred at application level: A task was canceled. System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [17:41:22 DBG] Heartbeat Stopped [17:41:22 WRN] [null] System.Threading.Tasks.TaskCanceledException: A task was canceled. at Discord.ConnectionManager.<>c__DisplayClass29_0.<b__0>d.MoveNext() in /src/discord-net/Discord.Net.WebSocket/ConnectionManager.cs:line 84 [17:41:22 INF] Disconnecting [17:41:22 DBG] Disconnecting ApiClient [17:41:22 DBG] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) [17:41:22 DBG] First chance exception occurred at application level: The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' System.Net.WebSockets.WebSocketException (0x80004005): The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' at System.Net.WebSockets.WebSocketValidate.ThrowIfInvalidState(WebSocketState currentState, Boolean isDisposed, WebSocketState[] validStates) [17:41:22 DBG] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [17:41:22 DBG] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) [17:41:22 DBG] First chance exception occurred at application level: The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' System.Net.WebSockets.WebSocketException (0x80004005): The WebSocket is in an invalid state ('Aborted') for this operation. Valid states are: 'Open, CloseReceived' at System.Net.WebSockets.WebSocketValidate.ThrowIfInvalidState(WebSocketState currentState, Boolean isDisposed, WebSocketState[] validStates) at System.Net.WebSockets.ManagedWebSocket.CloseOutputAsyncCore(WebSocketCloseStatus closeStatus, String statusDescription, CancellationToken cancellationToken) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [17:41:22 DBG] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [17:41:22 DBG] First chance exception occurred at application level: Unable to read data from the transport connection: Operation canceled. System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [17:41:22 DBG] First chance exception occurred at application level: Operation canceled System.Net.Sockets.SocketException (125): Operation canceled at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) [17:41:22 DBG] First chance exception occurred at application level: The operation was canceled. System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination) at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) [17:41:22 DBG] First chance exception occurred at application level: Operation canceled System.Net.Sockets.SocketException (125): Operation canceled at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Threading.Tasks.ValueTask`1.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state) --- End of stack trace from previous location --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [17:41:22 DBG] Sent VoiceStateUpdate [17:41:22 DBG] First chance exception occurred at application level: The operation was canceled. System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination) at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [17:41:22 DBG] First chance exception occurred at application level: The operation was canceled. System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination) at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) at System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32 minimumRequiredBytes, CancellationToken cancellationToken, Boolean throwOnPrematureClosure) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() [17:41:22 DBG] First chance exception occurred at application level: The operation was canceled. System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException (125): Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.ReadBufferedAsyncCore(Memory`1 destination) at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) at System.Net.WebSockets.ManagedWebSocket.EnsureBufferContainsAsync(Int32 minimumRequiredBytes, CancellationToken cancellationToken, Boolean throwOnPrematureClosure) at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.WebSockets.ManagedWebSocket.ReceiveAsyncPrivate[TResult](Memory`1 payloadBuffer, CancellationToken cancellationToken) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() av_interleaved_write_frame(): Broken pipe 171.9kbits/s speed=13.2x Error writing trailer of pipe:: Broken pipe size= 1687kB time=00:01:20.25 bitrate= 172.2kbits/s speed=10.6x video:0kB audio:1691kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Conversion failed! ```

I see that Sent VoiceStateUpdate you were talking about though, so perhaps there is more than one issue causing the restarts.

quinchs commented 2 years ago

Does using .NET 5 change anything?

kuru-rin commented 2 years ago

Does using .NET 5 change anything?

That's not something I can downgrade to quickly and easily, so it will be a last resort test I can do.

However if you're talking mainly about the BackgroundServiceExceptionBehavior that has changed in .NET 6, I have that set to Ignore, so it should be the same as .NET 5 in my app.

For now I will continue to find places with unhandled exceptions in discord-net code and add try-catches and logging where I can. The code is rather hard to follow, so I hope I can keep my sanity.

kuru-rin commented 2 years ago

So far I have traced it to a Segmentation fault (exit code 139).

I am assuming it is caused by either ffmpeg, libopus, or maybe libsodium? More likely libopus/libsodium since discord-net is using it directly, it would make sense that the app would crash. Since ffmpeg is run as an external process it should not crash the .NET process as far as I know.

Things I have done that did not help:

FFmpeg version seems recent enough too:

ffmpeg version 4.3.3-0+deb11u1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 10 (Debian 10.2.1-6)
configuration: --prefix=/usr --extra-version=0+deb11u1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil      56. 51.100 / 56. 51.100
libavcodec     58. 91.100 / 58. 91.100
libavformat    58. 45.100 / 58. 45.100
libavdevice    58. 10.100 / 58. 10.100
libavfilter     7. 85.100 /  7. 85.100
libavresample   4.  0.  0 /  4.  0.  0
libswscale      5.  7.100 /  5.  7.100
libswresample   3.  7.100 /  3.  7.100
libpostproc    55.  7.100 / 55.  7.100

I don't know anything about how this all works in discord-net, so it would be great if someone could give their insights in what the problem could be.

kuru-rin commented 2 years ago

I have chosen to circumvent the issue by using Lavalink instead. My bot has now been running for many hours without any restarts, so it is safe to say the issue is gone.

It might be good to warn people that the Discord.NET audio client can kill your app with segmentation faults. Perhaps it's only when using it in Linux containers, but it's hard to say for sure.