discordjs / discord.js

A powerful JavaScript library for interacting with the Discord API
https://discord.js.org
Apache License 2.0
25.36k stars 3.97k forks source link

WebSocketManager disconnects 30-60 minutes at a time #8558

Closed adam-a closed 1 year ago

adam-a commented 2 years ago

Which package is this bug report for?

ws

Issue description

Running WebSocketManager for a bot on my own personal Discord server

Tested this locally and on a staging within an EC2 instance -- the same issue applies

Randomly throughout the day this will occur and could go on for upwards of 30-60 minutes.

How can I ensure no/minimal disconnection? (Code sample below -- very straightforward implementation)

Connecting to [wss://gateway.discord.gg?v=10&encoding=json](wss://gateway.discord.gg/?v=10&encoding=json)
info:    Debug
Waiting for event hello for 60000ms
info:    Heartbeat
info:    Debug
Starting to heartbeat every 41250ms
info:    Debug
Identifying
        shard id: 0
        shard count: 1
        intents: 34305
        compression: none
info:    Debug
Disconnected normally from code 1000
info:    Debug
Destroying shard
        Reason: Got disconnected by Discord
        Code: 1000
        Recover: Reconnect
info:    Debug
Connecting to [wss://gateway.discord.gg?v=10&encoding=json](wss://gateway.discord.gg/?v=10&encoding=json)
info:    Debug
Waiting for event hello for 60000ms
info:    Heartbeat
info:    Debug
Starting to heartbeat every 41250ms
info:    Debug
Identifying
        shard id: 0
        shard count: 1
        intents: 34305
        compression: none
info:    Debug
Disconnected normally from code 1000
info:    Debug
Destroying shard
        Reason: Got disconnected by Discord
        Code: 1000
        Recover: Reconnect
info:    Debug
Connecting to [wss://gateway.discord.gg?v=10&encoding=json](wss://gateway.discord.gg/?v=10&encoding=json)
info:    Debug
Waiting for event hello for 60000ms
info:    Heartbeat
info:    Debug
Starting to heartbeat every 41250ms
info:    Debug
Identifying
        shard id: 0
        shard count: 1
        intents: 34305
        compression: none
info:    Debug
Disconnected normally from code 1000
info:    Debug
Destroying shard
        Reason: Got disconnected by Discord
        Code: 1000
        Recover: Reconnect
info:    Debug
Connecting to [wss://gateway.discord.gg?v=10&encoding=json](wss://gateway.discord.gg/?v=10&encoding=json)
info:    Debug
Waiting for event hello for 60000ms
info:    Heartbeat
info:    Debug
Starting to heartbeat every 41250ms
info:    Debug
Identifying
        shard id: 0
        shard count: 1
        intents: 34305
        compression: none

Code sample

if (!authToken) throw Error('BOT AuthToken not set');

  const rest = new REST().setToken(authToken);

  const manager = new WebSocketManager({
    handshakeTimeout: 20000,
    token: authToken,
    intents,
    rest,
  });

  manager.setStrategy(new SimpleShardingStrategy(manager));

  manager.addListener(WebSocketShardEvents.Debug, async (payload: { message: string }) =>
    logger.info(`Debug\n${payload.message}`)
  );

  manager.addListener(WebSocketShardEvents.Ready, () => logger.info('Ready'));

  manager.addListener(WebSocketShardEvents.Hello, () => logger.info('Heartbeat'));

  manager.addListener(WebSocketShardEvents.Resumed, () => logger.info('Resumed'));

  manager.addListener(
    WebSocketShardEvents.Dispatch,
    (
      event: {
        data: GatewayDispatchPayload;
      } & {
        shardId: number;
      }
    ) => logEvent(event.data)
  );

  await manager.connect();

Package version

14.3.0

Node.js version

18.6.0

Operating system

No response

Priority this issue should have

High (immediate attention needed)

Which partials do you have configured?

Message, Reaction

Which gateway intents are you subscribing to?

Guilds, GuildMessages, GuildMessageReactions, MessageContent

I have tested this issue on a development release

No response

didinele commented 2 years ago

What version of @discordjs/ws are you on?

adam-a commented 2 years ago

What version of @discordjs/ws are you on?

Hi! Thanks for responding.

I'm using 0.3.0.

It's driving me nuts. Even overrode updateSessionInfo and retrieveSessionInfo at some point to keep track of session info since it seems like it is being utilized in https://github.com/didinele/discord.js/blob/874ee5cee006edad6d7fe138d5dbe83f96d3f0e5/packages/ws/src/ws/WebSocketShard.ts#L105 (check line 128)

But that did not work as well.

didinele commented 2 years ago

Sorta familiar with this issue. We've been trying to fix it for a while and across multiple PRs. You're not experiencing Discord constantly telling you to reconnect (at least to my understanding), but rather a reconnect loop bug.

My latest attempt at patching this up was #8479, but seems like that wasn't enough.

cc @SpaceEEC

adam-a commented 2 years ago

Sorta familiar with this issue. We've been trying to fix it for a while and across multiple PRs. You're not experiencing Discord constantly telling you to reconnect (at least to my understanding), but rather a reconnect loop bug.

My latest attempt at patching this up was #8479, but seems like that wasn't enough.

cc @SpaceEEC

Do you think it's an issue in the underlying WebSocket package?

I implemented my bot with the standard WebSocket package before finding discord.js, and I had the same issue.

Also, side note, today I noticed this:

Waiting for event hello for 60000ms
info:    Heartbeat
info:    Debug
Starting to heartbeat every 41250ms
info:    Debug
Resuming session
info:    Debug
Resuming session
/Users/some/Documents/Github/some/shared_services/node_modules/ws/lib/websocket.js:441
      throw new Error('WebSocket is not open: readyState 0 (CONNECTING)');
            ^
Error: WebSocket is not open: readyState 0 (CONNECTING)
    at WebSocket.send (/Users/some/Documents/Github/some/shared_services/node_modules/ws/lib/websocket.js:441:13)
    at WebSocketShard.send (/Users/some/Documents/Github/some/shared_services/node_modules/@discordjs/ws/dist/ws/WebSocketShard.cjs:168:21)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async WebSocketShard.connect (/Users/some/Documents/Github/some/shared_services/node_modules/@discordjs/ws/dist/ws/WebSocketShard.cjs:97:7)
    at async WebSocketShard.onMessage (/Users/some/Documents/Github/some/shared_services/node_modules/@discordjs/ws/dist/ws/WebSocketShard.cjs:322:11)
[nodemon] app crashed - waiting for file changes before starting...
didinele commented 2 years ago

Do you think it's an issue in the underlying WebSocket package?

No.

adam-a commented 2 years ago

Do you think it's an issue in the underlying WebSocket package?

No.

I'd like to clone discord.js and assist in this effort. What's a good starting point for me particularly for this recurring loop bug?

didinele commented 2 years ago

packages/ws/src/ws/WebSocketShard.ts - mainly the onClose handler and the destroy method.

Dossar commented 2 years ago

Sorta familiar with this issue. We've been trying to fix it for a while and across multiple PRs. You're not experiencing Discord constantly telling you to reconnect (at least to my understanding), but rather a reconnect loop bug.

My latest attempt at patching this up was #8479, but seems like that wasn't enough.

cc @SpaceEEC

Hello, it looks like this thread might be the same root cause for bots disconnecting after an hour or earlier: https://github.com/discordjs/discord.js/issues/8486#issuecomment-1271033908

To summarize the findings of what I saw in that thread,

The common data point seems to be that somewhere between v13.8.1 and v13.10.3, something introduced this issue of bots disconnecting from the client. And then that ended up getting into v14.x

legendhimself commented 2 years ago

might be the same root cause for bots disconnecting after an hour or earlier: #8486 (comment)

@Dossar this is an issue from the new WS package and it has nothing to do with those issues since even in the latest versions DJS still does not make use of the WS package for creating the WebSockets or connecting to the discord gateway.

didinele commented 2 years ago

Hi! Are you still able to reproduce this issue on 0.4.1?

adam-a commented 1 year ago

Hi! Are you still able to reproduce this issue on 0.4.1?

Hi there! I just updated it and will be watching it. Will lyk! Thank you for the fix.

adam-a commented 1 year ago

@didinele Unfortunately, the issue persists with 0.4.1 :/

Nov 3, 2022 @ 11:54:13.921 | Debug Identifying  shard id: 0     shard count: 1  intents: 34305  compression: none
-- | --

  | Nov 3, 2022 @ 11:54:13.920 | Heartbeat

  | Nov 3, 2022 @ 11:54:13.920 | Debug Starting to heartbeat every 41250ms

  | Nov 3, 2022 @ 11:54:13.757 | Debug Waiting for event hello for 60000ms

  | Nov 3, 2022 @ 11:54:13.756 | Debug Connecting to wss://gateway.discord.gg?v=10&encoding=json

  | Nov 3, 2022 @ 11:54:13.755 | Debug Destroying shard     Reason: Got disconnected by Discord     Code: 1000  Recover: Reconnect

  | Nov 3, 2022 @ 11:54:13.755 | Debug Disconnected normally from code 1000

  | Nov 3, 2022 @ 11:53:18.760 | Debug Starting to heartbeat every 41250ms

  | Nov 3, 2022 @ 11:53:18.760 | Debug Identifying  shard id: 0     shard count: 1  intents: 34305  compression: none

  | Nov 3, 2022 @ 11:53:18.760 | Heartbeat

  | Nov 3, 2022 @ 11:53:18.604 | Debug Waiting for event hello for 60000ms

  | Nov 3, 2022 @ 11:53:18.602 | Debug Disconnected normally from code 1000

  | Nov 3, 2022 @ 11:53:18.602 | Debug Destroying shard     Reason: Got disconnected by Discord     Code: 1000  Recover: Reconnect

  | Nov 3, 2022 @ 11:53:18.602 | Debug Connecting to wss://gateway.discord.gg?v=10&encoding=json

  | Nov 3, 2022 @ 11:53:09.169 | HTTP Request

  | Nov 3, 2022 @ 11:52:23.398 | Debug Starting to heartbeat every 41250ms

  | Nov 3, 2022 @ 11:52:23.398 | Debug Identifying  shard id: 0     shard count: 1  intents: 34305  compression: none

  | Nov 3, 2022 @ 11:52:23.397 | Heartbeat

  | Nov 3, 2022 @ 11:52:23.247 | Debug Waiting for event hello for 60000ms

  | Nov 3, 2022 @ 11:52:23.246 | Debug Connecting to wss://gateway.discord.gg?v=10&encoding=json

  | Nov 3, 2022 @ 11:52:23.245 | Debug Disconnected normally from code 1000

  | Nov 3, 2022 @ 11:52:23.245 | Debug Destroying shard     Reason: Got disconnected by Discord     Code: 1000  Recover: Reconnect

  | Nov 3, 2022 @ 11:52:01.024 | HTTP Request
didinele commented 1 year ago

After another month of on-and-off debugging and a lot of frustration with being generally unable to reproduce this issue in a consistent matter, I've finally had a breakthrough and tracked it down to be how we handle the send ratelimit - the issue boils down to some reversed logic that was missed during review, essentially meaning that after ~120 heartbeats we think we're about to get ratelimited and block the entire send queue for... a lot of time (could be something like ~20 minutes in some cases).

WS connections auto-magically close with a 1000 close code if enough time passes with no flowing payloads. The queue stays stuck after we reconnect, so our IDENTIFY never goes through, once again, causing the connection to eventually close due to inactivity (causing the "reconnect loop" that we see) - eventually the sleep call ends and we successfully reconnect and start the cycle over again.

I'll have a PR fixing this later today, appreciate the flag and the effort in providing me with logs!

JMTK commented 1 year ago

@didinele would this also alleviate issues related to #8486 / #4095 ?

didinele commented 1 year ago

@didinele would this also alleviate issues related to #8486 / #4095 ?

discord.js (mainlib) does not currently use @discordjs/ws, it has its own own WS implementation (the "old" one), so no, those changes do not help at all at this time, but they will potentially will in the future when the main library starts using the ws package.