discordjs / discord.js

A powerful JavaScript library for interacting with the Discord API
https://discord.js.org
Apache License 2.0
25.36k stars 3.97k forks source link

[Meta-Thread] Internal Sharding Issues #3201

Closed vladfrangu closed 4 years ago

vladfrangu commented 5 years ago

Have you gotten an issue with internal sharding, like silent disconnects (where your bot just.. silently dies)?

Well, that sucks, but we want to get all issues solved so we don't have this issue anymore! Thereby, you can help us out by providing the following information to us:

1. Current commit hash you are running:

1. Debug logs before, at the time the issue occurred, and after the issue (couple of lines before and after the issue occurred; if your token is present, please censor it out)

1. Optional packages installed (zlib-sync, bufferutil, utf-8-validate)

Please do not reply with same, or anything that is not useful. Logs are by far the most useful thing you can provide to us, so please do so!

How to get debug logs?

If you want to get the logs for us, you can simply attach a debug event listener to the client:

client.on('debug', (message) => {
    // Save this message somewhere
});

Do note that the logs can get filled with a LOT of Heartbeat acknowledged or Sending heartbeat. You can filter those out as well, and only log "important" messages via something similar to:

client.on('debug', (message) => {
    if (/(Sending a heartbeat|Latency of)/i.test(message)) return null;
    // Log the message
});

If you use voice in your bot, you might want to filter the voice logs. You can do something similar to

client.on('debug', (message) => {
    if (/voice/i.test(message)) return null;
    // Log the message
});

Other infurmation

It is also normal that the WebSocket connection might close at random times with different close codes! (1000, 1006). Generally, the bot should handle those gracefully, and reconnect for you. If it doesn't, and your bot is now in a ghost state (the process is up, but the bot doesn't show online), you should fetch your debug logs and place them here.

When sending us the logs, please use a service like GitHub Gists, Hastebin or similar.

AGuyNamedJens commented 5 years ago

I didn't have debug logs on in the time, but out of nowhere my bot started to spam reconnections to the gateway (firing the ready event every few seconds) after being up for 3 days (excluding socket silent disconnects) EDIT: It caused the bot to hit the 1000 event/connection firing limit and completely killed my bot for 8 hours.. This NEVER happend BEFORE internal sharding update was introduced to discord.js master.

I just don't hope it will happen again but like.. is there any way to disable internal sharding rather than getting my bot to spam discord's API just because the silent disconnect reconnection just fails too often right now?

vladfrangu commented 5 years ago

@AGuyNamedJens thanks for letting us know! I've actually encountered the same (having to wait for 19 hours... yikes) but I can't tell what happened since I didn't find the logs where it happened yet. I'll definitely look into this, as it sounds like a massive issue, one that shouldn't happen but is happening.

For the time being, could you also enable debug logging, and report if it happens again?

Deivu commented 5 years ago

@vladfrangu I also encountered that behavior. The behavior started when the bot was 24ish online.

Unfortunately, it will be hard for me to provide debug logs due to the fact this bot runs and 45 shards and the log file goes out of bounds for my internet to handle it.

Refer to the Screenshot https://amanogawa.moe/cdn/no_u_reconnects.jpg

And this is really bad because just last 2 days, I was forced to wait 7 hours just because that happened. I am not sure if this a lib issue at first, but since someone other than me reported it, it could be a lib issue. Although I did a lot of optimizations on my end to make sure the issue is lib related and not my own misjudgement.

And also it never happened to me before the IS merge as well.

vladfrangu commented 5 years ago

I think I might have found the cause of this, and will create a PR to patch it! Are you willing to run the PR? @Deivu

Deivu commented 5 years ago

Sure, thats fine for me

andre-paulo98 commented 5 years ago

I'm on latest master (package-lock.json says: github:discordjs/discord.js#0d9bc8664dc2dc3b3bf7d35ba7b9050f2797a211) discord.js is the only package installed on this project. package.json and package-lock.json Using this code:

const Discord = require('discord.js');
const client = new Discord.Client();

client.on('ready', () => {
    console.log(`Logged in as ${client.user.tag}!`);
});

client.on("debug",console.debug);

client.login('mytoken');
Snippet of the log: ``` Provided token: Preparing to connect to the gateway... [WS => Manager] Fetched Gateway Information URL: wss://gateway.discord.gg Recommended Shards: 1 [WS => Manager] Session Limit Information Total: 1000 Remaining: 983 [WS => Manager] Spawning shards: 0 [WS => Shard 0] Trying to connect to wss://gateway.discord.gg/, version 6 [WS => Shard 0] Setting a HELLO timeout for 20s. [WS => Shard 0] Failed to connect to the gateway, requeueing... [WS => Manager] Shard Queue Size: 1; continuing in 5 seconds... [WS => Manager] Session Limit Information Total: 1000 Remaining: 983 [WS => Shard 0] Trying to connect to wss://gateway.discord.gg/, version 6 [WS => Shard 0] Setting a HELLO timeout for 20s. [WS => Shard 0] Failed to connect to the gateway, requeueing... [WS => Manager] Shard Queue Size: 1; continuing in 5 seconds... [WS => Manager] Session Limit Information Total: 1000 Remaining: 983 [WS => Shard 0] Trying to connect to wss://gateway.discord.gg/, version 6 [WS => Shard 0] Setting a HELLO timeout for 20s. [WS => Shard 0] Failed to connect to the gateway, requeueing... [WS => Manager] Shard Queue Size: 1; continuing in 5 seconds... [WS => Manager] Session Limit Information Total: 1000 Remaining: 983 [WS => Shard 0] Trying to connect to wss://gateway.discord.gg/, version 6 [WS => Shard 0] Setting a HELLO timeout for 20s. [WS => Shard 0] Failed to connect to the gateway, requeueing... [WS => Manager] Shard Queue Size: 1; continuing in 5 seconds... [WS => Shard 0] Did not receive HELLO in time. Destroying and connecting again. [WS => Shard 0] Clearing the HELLO timeout. [WS => Shard 0] Shard was destroyed but no WebSocket connection existed... Reconnecting... [WS => Manager] Session Limit Information Total: 1000 Remaining: 983 [WS => Shard 0] Trying to connect to wss://gateway.discord.gg/, version 6 [WS => Shard 0] Setting a HELLO timeout for 20s. [WS => Shard 0] Failed to connect to the gateway, requeueing... [WS => Manager] Shard Queue Size: 1; continuing in 5 seconds... [WS => Shard 0] Did not receive HELLO in time. Destroying and connecting again. [WS => Shard 0] Clearing the HELLO timeout. [WS => Shard 0] Shard was destroyed but no WebSocket connection existed... Reconnecting... [WS => Manager] Session Limit Information Total: 1000 Remaining: 983 [WS => Shard 0] Trying to connect to wss://gateway.discord.gg/, version 6 [WS => Shard 0] Setting a HELLO timeout for 20s. [WS => Shard 0] Failed to connect to the gateway, requeueing... [WS => Manager] Shard Queue Size: 1; continuing in 5 seconds... [WS => Shard 0] Did not receive HELLO in time. Destroying and connecting again. [WS => Shard 0] Clearing the HELLO timeout. [WS => Shard 0] Shard was destroyed but no WebSocket connection existed... Reconnecting... [WS => Manager] Session Limit Information Total: 1000 Remaining: 983 [WS => Shard 0] Trying to connect to wss://gateway.discord.gg/, version 6 [WS => Shard 0] Setting a HELLO timeout for 20s. [WS => Shard 0] Failed to connect to the gateway, requeueing... [WS => Manager] Shard Queue Size: 1; continuing in 5 seconds... [WS => Manager] Session Limit Information Total: 1000 Remaining: 983 [WS => Shard 0] Trying to connect to wss://gateway.discord.gg/, version 6 [WS => Shard 0] Setting a HELLO timeout for 20s. [WS => Shard 0] Failed to connect to the gateway, requeueing... [WS => Manager] Shard Queue Size: 1; continuing in 5 seconds... [WS => Manager] Session Limit Information Total: 1000 Remaining: 983 [WS => Shard 0] Trying to connect to wss://gateway.discord.gg/, version 6 [WS => Shard 0] Setting a HELLO timeout for 20s. [WS => Shard 0] Failed to connect to the gateway, requeueing... [WS => Manager] Shard Queue Size: 1; continuing in 5 seconds... [WS => Manager] Session Limit Information Total: 1000 Remaining: 983 [WS => Shard 0] Trying to connect to wss://gateway.discord.gg/, version 6 [WS => Shard 0] Setting a HELLO timeout for 20s. [WS => Shard 0] Failed to connect to the gateway, requeueing... [WS => Manager] Shard Queue Size: 1; continuing in 5 seconds... (node:1100) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 ready listeners added. Use emitter.setMaxListeners() to increase limit (node:1100) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 close listeners added. Use emitter.setMaxListeners() to increase limit (node:1100) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 invalidSession listeners added. Use emitter.setMaxListeners() to increase limit [WS => Shard 0] Did not receive HELLO in time. Destroying and connecting again. ```

Full log: https://gist.github.com/andre-paulo98/9b518661be6b2fb5def8527b1bb9528b

SpaceEEC commented 5 years ago

You are probably running an outdated node version, node 10 or newer is required.

andre-paulo98 commented 5 years ago

Yeah you're right, i'm using v8.12.0 Haven't seen the indication that node 10 was required. Now I can see it everywhere. Will update it soon and test this again

CampbellCrowley commented 5 years ago

github:discordjs/discord.js# 8652e47c14eccd1c8cab27f0d1acb8d7335349e8 github:discordapp/erlpack#674ebfd3439ba4b7ce616709821d27630f7cdc61 "@discordjs/uws": "11.149.1" "zlib-sync": "0.1.4" NodeJS: v10.15.3

I seem to be having this issue when the internet connection becomes unstable for a period of time, but some shards seem to recover fine. client.ws.status shows 0 for both shards, and both have a ping of 93ms and 97ms (acceptable values) from client.ws.ping, even though shard 1 appears offline in Discord, while shard 0 is online.

05-30 08:49:59 [WS => Shard 0] WebSocket was closed.\n      Event Code: 1000\n      Clean: undefined\n      Reason: No reason received
05-30 08:49:59 [WS => Shard 0] Clearing the heartbeat interval.
05-30 08:49:59 [WS => Shard 0] Shard was destroyed but no WebSocket connection existed... Reconnecting...
05-30 08:50:00 [WS => Shard 1] WebSocket was closed.\n      Event Code: 1000\n      Clean: undefined\n      Reason: No reason received
05-30 08:50:00 [WS => Shard 1] Clearing the heartbeat interval.
05-30 08:50:00 [WS => Shard 1] Shard was destroyed but no WebSocket connection existed... Reconnecting...
05-30 08:50:00 [WS => Manager] Session Limit Information\n        Total: 1000\n        Remaining: 989
05-30 08:50:00 [WS => Shard 1] Trying to connect to wss://gateway.discord.gg/, version 6
05-30 08:50:00 [WS => Shard 1] Setting a HELLO timeout for 20s.
05-30 08:50:02 [WS => Manager] Session Limit Information\n        Total: 1000\n        Remaining: 989
05-30 08:50:02 [WS => Shard 0] Trying to connect to wss://gateway.discord.gg/, version 6
05-30 08:50:02 [WS => Shard 0] Setting a HELLO timeout for 20s.
05-30 08:50:05 [WS => Shard 1] Received a uWs error. Closing the connection and reconnecting...
05-30 08:50:06 [WS => Shard 0] Opened a connection to the gateway successfully.
05-30 08:50:06 [WS => Shard 0] Clearing the HELLO timeout.
05-30 08:50:06 [WS => Shard 0] Setting a heartbeat interval for 41250ms.
05-30 08:50:06 [WS => Shard 0] Identifying as a new session. Shard 0/2
05-30 08:50:06 [WS => Shard 0] READY gateway-prd-main-qdf2 -> discord-sessions-prd-1-27 | Session 06fa25ef6db498d18e89a640c99466af.
05-30 08:50:06 [WS => Manager] There are 1 unavailable guilds. Waiting for their GUILD_CREATE packets
05-30 08:50:20 [WS => Shard 1] Did not receive HELLO in time. Destroying and connecting again.
05-30 08:50:20 [WS => Shard 1] Clearing the HELLO timeout.
05-30 08:50:20 [WS => Shard 1] Shard was destroyed but no WebSocket connection existed... Reconnecting...

While attempting to force the presence status to online, I receive the following:

05-30 12:15:02 [WS => Shard 1] Tried to send packet {"op":3,"d":{"afk":false,"since":null,"status":"online","game":{"type":3,"name":"I'm Online"}}} but no WebSocket is available!

But the shard does not appear to attempt to reconnect, and there are not other messages between these two log snippets, and the shards are working as expected prior to the first log.

vladfrangu commented 5 years ago

There seems to be a lot of issues revolving around uws, I'd recommend running your bot without it and see if you encounter that issue again! I will look into it as soon as possible, but I don't know for sure if this is a bug on d.js or uws's side

AGuyNamedJens commented 5 years ago

I don't think i have uws installed

On Fri, May 31, 2019, 08:25 Vlad Frangu notifications@github.com wrote:

There seems to be a lot of issues revolving around uws, I'd recommend running your bot without it and see if you encounter that issue again! I will look into it as soon as possible, but I don't know for sure if this is a bug on d.js or uws's side

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/discordjs/discord.js/issues/3201?email_source=notifications&email_token=AIIB3NV7NVTIIYFMI6UOWS3PYDAFBA5CNFSM4HFHEAL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWUKJXY#issuecomment-497591519, or mute the thread https://github.com/notifications/unsubscribe-auth/AIIB3NVWSM3R2VNPLTDAIF3PYDAFBANCNFSM4HFHEALQ .

vladfrangu commented 5 years ago

@AGuyNamedJens Hi! If you are referring to https://github.com/discordjs/discord.js/issues/3201#issuecomment-487072020, this has been fixed as of commit 577636a46df5ec4ac2e4fae591604a60fb22f6d2, so if you haven't updated yet, I'd recommend doing so!

If you've encountered any other issues after that commit, feel free to do what the issue suggests: Getting debug logs, listing your dependencies for discord.js (uws, zlib-sync, erlpack, zucc, etc) and we'll look into it 😄

viztea commented 5 years ago

github:discordjs/discord.js#6100aceef24abfff0d32fcf5a6ff79c2102d046f NodeJS: 11.6.0 Other Packages:
"@discordjs/uws": "^11.149.1", "@sentry/node": "^5.4.0", "discord.js": "github:discordjs/discord.js", "discord.js-lavalink": "github:mrjacz/discord.js-lavalink#v3", "mongoose": "^5.6.0", "simple-youtube-api": "^5.2.0"

I've been getting this for the last week or so.

Full Log: https://gist.github.com/LolWastedJS/5de65acf8a44155b99ce5541bfb184bd

I don't know anything else I can add.

kyranet commented 5 years ago

Try uninstalling @discordjs/uws, it causes more issues than it fixes, specially with sharding and latency.

viztea commented 5 years ago

@kyranet still happens without uwus

vladfrangu commented 5 years ago

@LolWastedJS Hi! Can you please get me the logs that were given after the removal of uws? Also, are you doing anything "spammy" with presence updates? From the logs you've attached... There's a LOT of presence packets that are tried to be sent..

anishshobithps commented 5 years ago

@vladfrangu , i am a friend of @LolWastedJS and he has fixed his issue

Deivu commented 5 years ago

Ok, seems like another really annoying issue popped up in my end. In my case, shards get stuck on "Disconnected" status and don't even try to reconnect. When I woke up earlier, I just found 3 shards dead as shown in this image. https://cdn.discordapp.com/attachments/537586710100312065/595452587751505920/unknown.png

Commit: ddcc6cfec9ccb80d42e1d074bf03db2e0dae274e Optional Package Installed: bufferutil and erlpack WS version: 7.0.0.

Last debug logs of the shards that disconnected:

[23:22:58.795] [LOG]    <PID: 12615> [WS => Shard 27] Didn't receive a heartbeat ack last time, assuming zombie conenction. Destroying and reconnecting.
[23:22:58.796] [LOG]    <PID: 12615> [WS => Shard 27] Clearing the heartbeat interval.
[23:23:29.004] [LOG]    <PID: 12828> Swept 1903 messages older than 300 seconds in 34625 text-based channels
[23:24:01.103] [LOG]    <PID: 13240> Swept 2015 messages older than 300 seconds in 34574 text-based channels
[23:24:40.205] [LOG]    <PID: 13448> Swept 1861 messages older than 300 seconds in 34769 text-based channels
[23:25:12.167] [LOG]    <PID: 13510> Swept 2397 messages older than 300 seconds in 37004 text-based channels
[23:25:45.083] [LOG]    <PID: 13602> Swept 2130 messages older than 300 seconds in 34086 text-based channels
[23:27:20.321] [LOG]    <PID: 12615> [WS => Shard 27] Tried to send packet {"op":3,"d":{"afk":false,"since":null,"status":"online","game":{"type":0,"name":"Kashima 🚢 | 49026 Ports"}}} but no WebSocket is available!

[08:09:19.210] [LOG]    <PID: 12678> [WS => Shard 29] Didn't receive a heartbeat ack last time, assuming zombie conenction. Destroying and reconnecting.
[08:09:19.210] [LOG]    <PID: 12678> [WS => Shard 29] Clearing the heartbeat interval.
[08:09:40.258] [LOG]    <PID: 13448> Swept 1793 messages older than 300 seconds in 34884 text-based channels
[08:10:12.215] [LOG]    <PID: 13510> Swept 1794 messages older than 300 seconds in 37003 text-based channels
[08:10:45.069] [LOG]    <PID: 13602> Swept 1807 messages older than 300 seconds in 34091 text-based channels
[08:17:20.363] [LOG]    <PID: 12615> [WS => Shard 27] Tried to send packet {"op":3,"d":{"afk":false,"since":null,"status":"online","game":{"type":0,"name":"Kashima 🚢 | 49021 Ports"}}} but no WebSocket is available!
[08:17:39.293] [LOG]    <PID: 11337> Swept 3159 messages older than 300 seconds in 46433 text-based channels
[08:17:52.453] [LOG]    <PID: 12678> [WS => Shard 29] Tried to send packet {"op":3,"d":{"afk":false,"since":null,"status":"online","game":{"type":0,"name":"Kashima 🚢 | 49021 Ports"}}} but no WebSocket is available!

[01:48:50.126] [LOG]    <PID: 12758> [WS => Shard 31] Didn't receive a heartbeat ack last time, assuming zombie conenction. Destroying and reconnecting.
[01:48:50.127] [LOG]    <PID: 12758> [WS => Shard 31] Clearing the heartbeat interval.
[01:48:51.492] [LOG]    <PID: 11948> Swept 1641 messages older than 300 seconds in 33058 text-based channels
[01:49:21.482] [LOG]    <PID: 12019> Swept 1886 messages older than 300 seconds in 35949 text-based channels
[01:49:51.461] [LOG]    <PID: 12128> Swept 1985 messages older than 300 seconds in 36020 text-based channels
[01:50:21.380] [LOG]    <PID: 12248> Swept 1834 messages older than 300 seconds in 33429 text-based channels
[01:50:51.550] [LOG]    <PID: 12288> Swept 1917 messages older than 300 seconds in 35822 text-based channels
[01:51:22.347] [LOG]    <PID: 12521> Swept 1383 messages older than 300 seconds in 33816 text-based channels
[01:51:53.112] [LOG]    <PID: 12615> Swept 1147 messages older than 300 seconds in 34719 text-based channels
[01:52:25.901] [LOG]    <PID: 12678> Swept 1616 messages older than 300 seconds in 36100 text-based channels
[01:52:58.040] [LOG]    <PID: 12758> Swept 1687 messages older than 300 seconds in 37180 text-based channels
[01:53:28.997] [LOG]    <PID: 12828> Swept 1736 messages older than 300 seconds in 34591 text-based channels
[01:54:01.088] [LOG]    <PID: 13240> Swept 1786 messages older than 300 seconds in 34590 text-based channels
[01:54:40.258] [LOG]    <PID: 13448> Swept 1827 messages older than 300 seconds in 34777 text-based channels
[01:55:12.208] [LOG]    <PID: 13510> Swept 2189 messages older than 300 seconds in 37047 text-based channels
[01:55:45.061] [LOG]    <PID: 13602> Swept 2030 messages older than 300 seconds in 34095 text-based channels
[01:57:20.322] [LOG]    <PID: 12615> [WS => Shard 27] Tried to send packet {"op":3,"d":{"afk":false,"since":null,"status":"online","game":{"type":0,"name":"Kashima 🚢 | 49032 Ports"}}} but no WebSocket is available!
[01:58:23.449] [LOG]    <PID: 12758> [WS => Shard 31] Tried to send packet {"op":3,"d":{"afk":false,"since":null,"status":"online","game":{"type":0,"name":"Kashima 🚢 | 49032 Ports"}}} but no WebSocket is available!
vladfrangu commented 5 years ago

@Deivu hey, thanks for reporting in! Do you have any logs that go back further, from, say, a WebSocket was closed messag, for those shards? I have a suspicion as to why this happened, and I'll try to get it fixed soon, but those logs would help a bit too!

Deivu commented 5 years ago

@vladfrangu it would be great if those logs are available, but the fact that it never get logged and just logs that makes it weird enough.

Deivu commented 5 years ago

Just an update on that issue I had above, That happens with pako only if you don't have zlib-sync installed. If you ever faced the same issue I had, please do try to install zlib-sync first and see if that fixes your silent disconnections.

vladfrangu commented 4 years ago

Just an update on that issue I had above, That happens with pako only if you don't have zlib-sync installed. If you ever faced the same issue I had, please do try to install zlib-sync first and see if that fixes your silent disconnections.

In relation to this comment, as of some of the latest commits in #3393, pako support was removed from both browser and node.js. Thus, if you want compression, you should install zlib-sync. Note that zlib-sync will only work in a node.js environment.

Why was pako removed? For one, it caused packet issues after a (rather large but easy to reproduce) time, where the last packet it received WOULD BE infinitely repeated. Secondly, it is built in JS, which isn't as fast nor efficient as native solutions.

iCrawl commented 4 years ago

I'll close this as resolved and if further issues arise it should be opened in new issues.

pauldb09 commented 2 years ago

I'm havi

Just an update on that issue I had above, That happens with pako only if you don't have zlib-sync installed. If you ever faced the same issue I had, please do try to install zlib-sync first and see if that fixes your silent disconnections.

In relation to this comment, as of some of the latest commits in #3393, pako support was removed from both browser and node.js. Thus, if you want compression, you should install zlib-sync. Note that zlib-sync will only work in a node.js environment.

Why was pako removed? For one, it caused packet issues after a (rather large but easy to reproduce) time, where the last packet it received WOULD BE infinitely repeated. Secondly, it is built in JS, which isn't as fast nor efficient as native solutions.

Still having the issue with that

imranbarbhuiya commented 2 years ago

I'm havi

Just an update on that issue I had above, That happens with pako only if you don't have zlib-sync installed. If you ever faced the same issue I had, please do try to install zlib-sync first and see if that fixes your silent disconnections.

In relation to this comment, as of some of the latest commits in #3393, pako support was removed from both browser and node.js. Thus, if you want compression, you should install zlib-sync. Note that zlib-sync will only work in a node.js environment. Why was pako removed? For one, it caused packet issues after a (rather large but easy to reproduce) time, where the last packet it received WOULD BE infinitely repeated. Secondly, it is built in JS, which isn't as fast nor efficient as native solutions.

Still having the issue with that

Hey, it's an old issue. If you're able to reproduce the issue in v13.5.0 then create a new issue.