discord-jda / JDA

Java wrapper for the popular chat & VOIP service: Discord https://discord.com
Apache License 2.0
4.35k stars 734 forks source link

Some shard's connection is broken, but can't reconnect until the program exit #1294

Closed JellyBrick closed 4 years ago

JellyBrick commented 4 years ago

General Troubleshooting

Bug Report

Some shard's connection is broken, but can't reconnect until the program exit. (with this warnings)

...
[20:07:31.088] [WARN ] [WebSocketClient]: Hit the WebSocket RateLimit! This can be caused by too many presence or voice status updates (connect/disconnect/mute/deaf). Regular: 0 Voice: 19 Chunking: 0
[20:07:36.516] [WARN ] [WebSocketClient]: Missed 2 heartbeats! Trying to reconnect...
[20:08:44.271] [WARN ] [WebSocketClient]: Hit the WebSocket RateLimit! This can be caused by too many presence or voice status updates (connect/disconnect/mute/deaf). Regular: 0 Voice: 19 Chunking: 0
...

It appears to be related to this PR (#1282).

Expected Behavior

The connection will be auto recovering soon.

Code Example or Reproduction Steps

N/A

Code for JDABuilder or DefaultShardManagerBuilder Used

    DefaultShardManagerBuilder.createDefault(token)
                    .setAutoReconnect(true)
                    .setAudioSendFactory(
                            AsyncPacketProviderFactory.adapt(
                                    NativeAudioSendFactory()
                            )
                    )
                    .addEventListeners(eventListener)
                    .setUseShutdownNow(true)
                    .setBulkDeleteSplittingEnabled(true)
                    .build()

Exception or Error

N/A

MinnDevelopment commented 4 years ago

What is the shard status? Do you have any exceptions, warnings, or error logs? Can you provide a thread dump?

JellyBrick commented 4 years ago
  1. In one or more guild, bot going offline (total shards: 30)
  2. This is the warning message (spam message in console with intervals regular)
[WARN ] [WebSocketClient]: Hit the WebSocket RateLimit! This can be caused by too many presence or voice status updates (connect/disconnect/mute/deaf). Regular: 0 Voice: 19 Chunking: 0
[WARN ] [WebSocketClient]: Missed 2 heartbeats! Trying to reconnect...
Andre601 commented 4 years ago

Do you have any priviledged intents enabled/disabled? I received a similar issue, but with "Chunking" and a fix was to change the ChunkingFilter to NONE and to also disable The Intents for PRESENCE, VOICE_STATE (I assume you need it?) and CLIENT_STATUS.

Not sure if disabling some of them would help.

MinnDevelopment commented 4 years ago

I need the following:

  1. JDA#getStatus for the shard that is stuck
  2. All WARN/ERROR log messages of that session (or at least leading up to the stuck state)
  3. The thread dump when its stuck jstack -l <pid>
JellyBrick commented 4 years ago

Privileged intents are disabled. and

  1. The status is all CONNECTED.
  2. This's it.
    [WARN ] [daemon-pool-gateway-4-thread-2] [WebSocketClient]: Hit the WebSocket RateLimit! This can be caused by too many presence or voice status updates (connect/disconnect/mute/deaf). Regular: 0 Voice: 19 Chunking: 0
    [WARN ] [daemon-pool-gateway-4-thread-3] [WebSocketClient]: Missed 2 heartbeats! Trying to reconnect...
  3. [daemon-pool-gateway-4-thread-2]

priority:5 - threadId:daemon-pool-gateway-4-thread-2 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
- parking to wait for <0x0000000481611850> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6/AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.6/ScheduledThreadPoolExecutor.java:1177)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.6/ScheduledThreadPoolExecutor.java:899)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.6/ThreadPoolExecutor.java:1054)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6/ThreadPoolExecutor.java:1114)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)

[daemon-pool-gateway-4-thread-3]

priority:5 - threadId:daemon-pool-gateway-4-thread-3 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
- parking to wait for <0x0000000481611850> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@11.0.6/AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.6/ScheduledThreadPoolExecutor.java:1177)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@11.0.6/ScheduledThreadPoolExecutor.java:899)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.6/ThreadPoolExecutor.java:1054)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6/ThreadPoolExecutor.java:1114)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
MinnDevelopment commented 4 years ago

Thats only 2 daemon threads, this isn't a thread dump. Please provide a complete thread dump. You can post it on https://gist.github.com and link it here.

MinnDevelopment commented 4 years ago

It looks like you somehow killed the WriteThread for shard 4.

MinnDevelopment commented 4 years ago

Update to 4.1.1_148 and check if the behavior returns.

JellyBrick commented 4 years ago

Thanks! It seems like the problem is solved!