discordjs / discord.js

A powerful JavaScript library for interacting with the Discord API
https://discord.js.org
Apache License 2.0
25.36k stars 3.97k forks source link

CPU peaks on Master only #3415

Closed Saywn closed 5 years ago

Saywn commented 5 years ago

Please describe the problem you are having in as much detail as possible: Issue experienced on master but fine on stable, tested for a bot with 10 shards and more than 10 000 servers: Some shards are still active regarding the CPU usage (up to 3% for a shard) even if nobody is using the commands.

To find why, I disabled all the events and dependencies until there is only Discord.js-master left. The problem was still there: some shards transiently peak at 3% CPU while others stay at 0.1% (not always the same shards, they alternate)

Then, i used the stable version instead of master, with no other changes (both work correctly for the bot). The issue isn't present on stable, (although there are some rare occasional 3% peaks, negligible) Here are the screenshots for the comparison, taken 10 minutes after the reboot (the problem persists beyond) Discord.js: https://imgur.com/40Ngo9L Discord.js-master: https://imgur.com/vTfadiq

The issue can only be reproduced in production with the 10 000 servers. I also reproduced the issue on a second server with everything freshly installed.

Further details:

Deivu commented 5 years ago

I would recommend a CPU profiler to track the issue here, I managed to fit 13k servers in a single process without much issue in master.

Gawdl3y commented 5 years ago

It would also likely be prudent to update Node to the latest version.

JMTK commented 5 years ago

I think it has something to do with event handlers for a very common event, such as reactions, typing, messages. If you log your events you'll notice your log immediately blow up. Not confirmed with a CPU profiler but it's my currently running theory because I've experienced this as well. On Node 12

Saywn commented 5 years ago

I'll continue to search in better conditions later, but so far a CPU profiler didn't give me useful results

Androz2091 commented 5 years ago

I have a bot with about 250 servers and there are significant CPU peaks too...

Discord.js: master (lastest) Node: 12.7.0 Debian: 10

Htop of my server

In itself the CPU problem is not serious I can upgrade my machine but I also have a big problem of latency... My ping command displays 4 seconds ping. This is huge. I don't think it comes from my code because it's more the message event that takes time to trigger rather than the message processing and I didn't have this problem before switching to the master version. Unfortunately I didn't do what I should have done = log the ping every hour to see if it increases slowly or if it changed suddenly while passing on the master. I don't know if my problem must be put in this issue so tell me if I have nothing to do here. ^^

P.S.: If you have any doubts about my code it is available on github (Atlanta, stable version, not the master)

Saywn commented 5 years ago

I made some additional tests. For the tests, I still disabled all the events, thus my code only included an instance of the client, the sharding manager and the bot token, so the bot was similar to a new bot except for its number of servers

When splitting the bot into 20 shards (around 600 servers per shard) instead of 10 shards, I noticed some shards actually experienced the CPU issue a lot more than others. Actually, the more it uses the RAM, the more the CPU usage is likely to be high.

So while still having everything disabled, I launched all the shards individually to isolate a bad shard and a good shard

I isolated a problematic shard that is always consuming between 1% to 2% CPU (which is high regarding the server perks) each time I launch it (shard 14: this shard is often around 0.1% CPU during the 4 first minutes after the launch, before jumping to 1% CPU and not going under this value afterwards)

I also isolated a shard (17) that is constantly consuming 0.1%, which would confirm it is shard-specific. This shard is also using less RAM than the problematic shard.

Still didn't manage to get something useful with a CPU profiler (maybe I should look for another one), even when comparing these two shards. I assume some servers could be the cause of these differences between shards (maybe large servers?) but that's all I know so far

appellation commented 5 years ago

Without a CPU profile, we can only shoot in the dark about what might be causing this issue. I would recommend finding some way of determining what part of the code is causing this and then we can work from there.

MrJacz commented 5 years ago

From some nodejs cpu profiles i did a while ago (3-5 months ago) it seems Websocket.pack/unpack was the most cpu hungry

kyranet commented 5 years ago

My best guess is compression:

discord.js stable does not use compression, thus it uses a large deal of bandwidth to operate:

https://github.com/discordjs/discord.js/blob/1121b2f7bff4caabce2812fb618167304cb00c66/src/client/websocket/WebSocketConnection.js#L254

However in master we have compression on by default, and using pako, a zlib library fully written in the JavaScript from the dinosaur era, since it's made to run everywhere, even in extremely old browsers.

https://github.com/discordjs/discord.js/blob/5af8cb8e6e7591e81f758a8b0f6748db7c2f12f1/src/client/websocket/WebSocketShard.js#L248-L275

However, we offer two alternatives to this slow library, zlib-sync, which has just updated recently for Node.js 12 (just 2 hours ago), and zucc, using the Zstandard compression, which can inflate and deflate much faster than Zlib, but not 100% reliable, as some people can't get their bots (including mine) to successfully identify.

I have to note that both libraries need building (zucc won't in a near future, as it's getting rewritten to Wasm), but both decrease CPU usage enormously.

In my case, using no library was rendering CPU usage to ~30-45% all time, after installing zlib-sync, it's back to ~3-6%, so I guess, give any of both a shot!

And if you want to see the difference! image

Discord essentially spams my bot with PRESENCE_UPDATE messages, 20k a minute 24/7, so the graph is pretty consistent, and pako just can't scale.

As for the CPU spikes, in my case they're most likely just GC cycles, since Node.js is the only process using a noticeable amount of CPU, and I'm not running anything heavy every 30 exact seconds, and my message sweep interval is set at 120 (or 2 minutes). image

tl;dr:

JMTK commented 5 years ago

Thanks @kyranet, I can confirm this helps very significantly. I haven't had much time to go experimenting with combinations of the optional packages but this is definitely a huge boost for larger scale bots. Currently running ~7000 servers across 7 shards. I'd estimate thousands of events per minute like @Saywn was saying.

This is my Digital Ocean CPU graph from today after trying this. image

I wonder if these compression packages should just be a mandate once you scale your bot enough. Are there any downsides?

kyranet commented 5 years ago

We're using compression by default in master because networking resources are the most limited one, most (decent, not necessarily powerful) VPS provide enough CPU and RAM to power a bot in over 50k guilds across 25 shards, but not the bandwidth necessary for the bot the operate.

However, not everyone has node-gyp installed, nor everyone can just install it (limited environments or resources to build), so we ship discord.js with pako by default, since everyone can run it, and browsers can too.

To answer your question, yes, those packages are mandatory if you want to scale your bot properly - and yes, higher CPU usage for lower bandwidth usage, it's a tradeoff we must take.

And glad I could help reducing your CPU usage noticeably :smile:

You can also lower it even more by installing utf-8-validate, which is used in ws. It's undocumented and marked as dev dependency on their end, but also helps. Not by as much as using zlib-sync versus pako, but it can be noticed. 👍

Saywn commented 5 years ago

No CPU issue anymore with zlib-sync, thanks!

kyranet commented 5 years ago

You're welcome! I'm glad I could help! 😄