Slimefun / Slimefun4

Slimefun 4 - A unique Spigot/Paper plugin that looks and feels like a modpack. We've been giving you backpacks, jetpacks, reactors and much more since 2013.
GNU General Public License v3.0
974 stars 546 forks source link

Performance Improvement Builds seem to have opposite effect. #1165

Closed mrcoffee1026 closed 5 years ago

mrcoffee1026 commented 5 years ago

Description (Required)

Server ran at a happy 20.0tps average before these, now I'm barely able to manage 16tps with hardly any players on.

Steps to reproduce the Issue (Required)

Compare timings from build #280: https://timings.aikar.co/?id=f1ff9e9c2d0c4b70843ec9f93ab4b2f7#plugins to timings from build #289: https://timings.aikar.co/?id=ae3b0733adc6410cb50783fb5e7c2088#plugins

SF Timings from 289
 == Slimefun Diagnostic Tool ==
[10:19:25 INFO]: Halted: FALSE
[10:19:25 INFO]:
[10:19:25 INFO]: Impact: 4ms / 50-750ms
[10:19:25 INFO]: Ticked Chunks: 45
[10:19:25 INFO]: Ticked Machines: 228
[10:19:25 INFO]: Skipped Machines: 382
[10:19:25 INFO]:
[10:19:25 INFO]: Ticking Machines:
[10:19:25 INFO]:   PROGRAMMABLE_ANDROID_WOODCUTTER - 8x (3ms)
[10:19:25 INFO]:   BARREL_GIGANTIC - 14x (2ms)
[10:19:25 INFO]:   CARBON_PRESS_3 - 4x (1ms)
[10:19:25 INFO]:   PROGRAMMABLE_ANDROID_MINER - 11x (6ms)
[10:19:25 INFO]:   INFUSED_HOPPER - 43x (1ms)
[10:19:25 INFO]:   ELECTRIC_SMELTERY_2 - 6x (1ms)
[10:19:25 INFO]:   CARGO_MANAGER - 17x (1ms)
[10:19:25 INFO]: + 22 Hidden
[10:19:25 INFO]:
[10:19:25 INFO]: Ticking Chunks:
[10:19:25 INFO]:   &cworld;Chunk;32;35 - 41x &7(2ms)
[10:19:25 INFO]:   &cworld;Chunk;32;36 - 69x &7(1ms)

Expected behavior (Required)

I realize this will be a tough issue to trace, but thought someone might want the heads up.

Server Log / Error Report

Whatever this is, it isn't caused by error floods to the log. Everything seems to be working normally, just slower/laggier.

Environment (Required)

[10:23:41 INFO]: Paper v1_14_R1 (207) [10:23:41 INFO]: [10:23:41 INFO]: CS-CoreLib vDEV - 77 (git 6e171836) [10:23:41 INFO]: Slimefun vDEV - 289 (git 64bd72a6) [10:23:41 INFO]: [10:23:41 INFO]: Installed Addons (12) [10:23:41 INFO]: Barrels vDEV - 7 (git 0d14dd9f) [10:23:41 INFO]: SlimeXpansion vDEV - 4 (git 5bf10c29) [10:23:41 INFO]: ExtraGear vDEV - 3 (git de222ab7) [10:23:41 INFO]: ChestTerminal vDEV - 17 (git 933a7d36) [10:23:41 INFO]: EmeraldEnchants vDEV - 5 (git 6adbbb87) [10:23:41 INFO]: HotbarPets vDEV - 20 (git cd87bc60) [10:23:41 INFO]: ExoticGarden vDEV - 22 (git a855f5b7) [10:23:41 INFO]: SoulJars vDEV - 4 (git 4a600aa2) [10:23:41 INFO]: SlimefunOreChunks vDEV - 2 (git 24d90a3e) [10:23:41 INFO]: ElectricSpawners vDEV - 5 (git c3f8cc9d) [10:23:41 INFO]: ExtraHeads v1.1 [10:23:41 INFO]: ColoredEnderChests vDEV - 7 (git 6885cb88)

TheBusyBiscuit commented 5 years ago

From the timings report it seems like every part of Slimefun was actually improved performance-wise. Just the Cargo Net seems to perform even worse now... which is weird, because there hasn't been any change to the Cargo Net...

WalshyDev commented 5 years ago

Do remember these are dev builds not stable builds. From your 289 it is cargo stuff not what we've been working on. Some cargo code was changed to be more efficient so itll be lower than it even is in that. But regardless, it isn't exactly much impact.

TheBusyBiscuit commented 5 years ago

@mrcoffee1026 Try build 290 when it's uploaded. Maybe fixing #1162 also solved this performance issue.

mrcoffee1026 commented 5 years ago

Yeah i'm going to try that first thing. @WalshyDev I do know these are Dev builds, but I am interested in testing these as much as running along with them. I've stuck to the dev builds since I've known of the repo location and that has seemed to have been the proper approach up to now. Even if a build doesn't have the desired effect sometimes, the only way to know is to try.

Interesting that bit about the cargo networks... at the time there was only one network currently "active", I believe. So that's a lot of load for one player's dust factory. Having many people on running theirs won't be sustainable. I'll have to see what this new build does.

mrcoffee1026 commented 5 years ago

latest timings with 290... 92% with only one other player on https://timings.aikar.co/?id=60c1a2fd7e4846f0aede388487992f52#plugins

TheBusyBiscuit commented 5 years ago

Still Cargo Nets... Can you show us the Cargo Network?

mrcoffee1026 commented 5 years ago

Do you mean screen shots of the physical networks or the sf timings again:

[12:30:38 INFO]: == Slimefun Diagnostic Tool ==
[12:30:38 INFO]: Halted: FALSE
[12:30:38 INFO]:
[12:30:38 INFO]: Impact: 8ms / 50-750ms
[12:30:38 INFO]: Ticked Chunks: 44
[12:30:38 INFO]: Ticked Machines: 236
[12:30:38 INFO]: Skipped Machines: 374
[12:30:38 INFO]:
[12:30:38 INFO]: Ticking Machines:
[12:30:38 INFO]:   BARREL_LARGE - 12x (3ms)
[12:30:38 INFO]:   BARREL_GIGANTIC - 14x (2ms)
[12:30:38 INFO]:   AUTOMATED_CRAFTING_CHAMBER - 4x (1ms)
[12:30:38 INFO]:   PROGRAMMABLE_ANDROID_MINER - 7x (1ms)
[12:30:38 INFO]:   ELECTRIC_ORE_GRINDER - 9x (1ms)
[12:30:38 INFO]:   ENERGY_REGULATOR - 4x (3ms)
[12:30:38 INFO]:   AUTO_DISENCHANTER - 5x (1ms)
[12:30:38 INFO]:   GPS_TRANSMITTER_3 - 3x (1ms)
[12:30:38 INFO]: + 24 Hidden
[12:30:38 INFO]:
[12:30:38 INFO]: Ticking Chunks:
[12:30:38 INFO]:   &cworld;Chunk;100;-67 - 5x &7(2ms)
[12:30:38 INFO]:   &cworld;Chunk;100;-68 - 26x &7(2ms)
[12:30:38 INFO]:   &cworld;Chunk;101;-67 - 6x &7(1ms)
[12:30:38 INFO]:   &cworld;Chunk;32;35 - 41x &7(1ms)
[12:30:38 INFO]:   &cworld;Chunk;32;36 - 69x &7(1ms)

It can't be this bad for so few players on or I'll have to disable cargo networks entirely. I can't get better than 9tps right now, so something is remarkably worse.

TheBusyBiscuit commented 5 years ago

@mrcoffee1026 I mean the actual layout of your Cargo Network. Everything else on timings looks good, even /sf timings shows all other stuff is performing insanely well.

It still makes no sense for it to drop so drastically since nothing about the Cargo Nets has changed...

mrcoffee1026 commented 5 years ago

bit difficult to see it all but here is one person's auto dust factory: https://imgur.com/a/MF2mUo9 cobble generators... woodcutters to fuel them all (none working currently since these all stop a few blocks into chopping and then stop completely). cobble processes down to dusts which is sorted into shulkers. Quite a lot of nodes although not necessarily a lot of material throughput.

I don't think it's entirely cargo net here since the tps is just as bad when the players with auto factories are offline and those chunks unloaded. It seems to me that the performance changes have improved the timings reports, etc... but it's eating up all but 9% of the server's resources to accomplish that: the overall result has been noticeable lag across the board... slimefun might have improved performance, but we'll never see it since it's nearly killing the server entirely to pull it off. I can revert to a build pre-performance upgrades to remedy this - still. The cargo network needs help in some areas, certainly, but I wouldn't expect many changes to drop slimefuns usage back down to a reasonable number as dramatically as the last few builds have raised it as none of them were addressing the cargo network, as you've already said.

mrcoffee1026 commented 5 years ago

Build 293: https://timings.aikar.co/?id=b41cd721228b42cd8c20b87bd6df6fa2 still using 96% of server resorces (950% of tick??!!). No networks actively doing anything since no players other than myself are even on currently, so I don't think the cargo networks are the source of the tps drop here.

WalshyDev commented 5 years ago

StackMob is also really intensive there

mrcoffee1026 commented 5 years ago

It's using less than 2% of tick. More intense than some plugins, perhaps, but increasing my entity count by removing stackmob would not be an improvement here. The problem here is slimefun only leaving 4% of my server resources free to be used by anything else.

TheBusyBiscuit commented 5 years ago

You always have to take timings reports with a grain of salt. They are not very accurate.

But yours is showing that Slimefun is performing very well, it's just a Cargo Network on your Server that is doing a ton of calculations. So you can't tell us that it's not that. Because if it's not, then this issue can be closed as invalid.

mrcoffee1026 commented 5 years ago

If that were true then reverting to something like 279 wouldn't fix this, yet it does. Anything that can assist with cargo efficiency will be helpful, certainly, just not to the extent that is needed, I fear. This has happened before when huge changes were being made to this plugin, and the situation dramatically improved once the project was "finished". So perhaps when all items have been moved to the new system and various routines needed to be in place to check for both versions of an item are no longer needed are gone, it'll just work fine. I was hoping for something simpler than all of that though since it was certainly like flipping a switch to make it worse by putting those builds, I thought it might be just as easy to flip it back. I'll keep testing as you make additional changes, but a bunch of empty containers hooked up to inactive networks shouldn't drag the server down to a perpetual stop.... it's working way too hard to do nothing.

WalshyDev commented 5 years ago

Can you please zip your server and send it over to us on Discord so that we can look at this in your environment? From my testing (and I believe Cookie had the same) it is completely fine, 0 issues.

mrcoffee1026 commented 5 years ago

That's like 80GB... which I'm sure is a bit more than you'd actually need here, what would you want me to get?

WalshyDev commented 5 years ago

world, SF data files, SF and CS-CoreLib folder in plugins.

mrcoffee1026 commented 5 years ago

I have those files for you but it's still like 30GB

mrcoffee1026 commented 5 years ago

And - in a way - it is cargo.. in so much if more than one chunk with a cargo network is loaded by a player being in it simultaneously (in relatively distant locations in the world), the server pretty much grinds to a halt (or to about 10tps max). There was not this sort of limitation on earlier builds. I'll try newer builds as they become available but for now, reverting to build 278 seems to be the most effective way to avoid the lag and server tps drop. I've tried removing several cargo managers from my own networks and this has had little to no effect. The number of "ticked machines" does drop by about 100-200 when reverting to the earlier version, however, this is probably the larger contributor. Timings comparison build 276: https://timings.aikar.co/?id=a37885e70284409d90d0ce29bec2a541#plugins (notice only 3.87% vs. 96%of tick w/ build 293) and sf timings: [18:53:25 INFO]: == Slimefun Diagnostic Tool == [18:53:25 INFO]: Halted: FALSE [18:53:25 INFO]: [18:53:25 INFO]: Impact: 3ms / 50-750ms [18:53:25 INFO]: Ticked Chunks: 36 [18:53:25 INFO]: Ticked Machines: 176 [18:53:25 INFO]: Skipped Machines: 478 [18:53:25 INFO]: [18:53:25 INFO]: Ticking Machines: [18:53:25 INFO]: BARREL_GIGANTIC - 14x (1ms) [18:53:25 INFO]: ENERGY_REGULATOR - 2x (1ms) [18:53:25 INFO]: + 24 Hidden [18:53:25 INFO]: [18:53:25 INFO]: Ticking Chunks: [18:53:25 INFO]: &cworld;Chunk;39;27 - 1x &7(1ms) [18:53:25 INFO]: &cworld;Chunk;32;35 - 41x &7(1ms) [18:53:25 INFO]: &cworld;Chunk;32;36 - 69x &7(1ms)

As you say, with a grain of salt... however the experience proves out the numbers here. With the updates, the server pretty much grinds to a halt. The same networks are in the world for both builds, this older one works far better, performance wise. Not a tiny bit better: a massive amount better. So again, thought you'd want to take a look at what happened.

TheBusyBiscuit commented 5 years ago

So I plugged it all into VisualVM to go into a deep analysis of what method call takes how long etc... I setup about 5 Cargo Networks, some of which just being a couple of machines feeding into each other, one ChestTerminal Network and even one self-feeding loop network as seen here: 737b1c03a9947b5dddd842eaec2b59d7

The result was... Well the performance was great, even with this level of logging and timings-collection, the Server still survived all of that. And according to VisualVM the Cargo Net did take a bit of CPU load but it is very unnoticeable.

As we said before: We didn't change anything that could have had that much of an effect on the performance. Your timings suggest it to be Cargo Networks. But a deep analysis of the CPU shows that Slimefun's performance does not seem compromised.

(Also the Test Server had about 30 plugins and every addon available, totalling up to almost 10,000 Slimefun items)

CPU-calls

If you want you can check via VisualVM and send us that report too but for me this is a closed case.

mrcoffee1026 commented 5 years ago

Again, I did specify this happens with multiple users on the server loading networks in chunks that are remote to each other in the world. If I log into my server... by myself, thereby only activating one network of machines, I don't get any lag either, regardless of how complex that network is. Problem is that now, only one network can be run at a time it seems, and this was not a problem before. Again, real-world use timings: https://timings.aikar.co/?id=bd570b441e6c45bcb33f05a8ad722ebc 7 players online 20tps average with build 278 (again usage at 18% of tick vs 96% of tick with build 296. Real world experience is saying the new build performs far worse and just isn't playable. Until the bottleneck is addressed, this is a dealbreaker, we can't play at 9tps, just can't. Run all the virtual machines you want, on a server with players, we can't play. We're running just fine with build 278, so until the issue is dealt with, I have to maintain with the old build. Same cargo networks running. So I can't blame the cargo. (these timings after restarting about 12-20 hours prior so should be much worse here, yet we aren't getting much - if any - lag) [13:15:14 INFO]: == Slimefun Diagnostic Tool == [13:15:14 INFO]: Halted: FALSE [13:15:14 INFO]: [13:15:14 INFO]: Impact: 5ms / 50-750ms [13:15:14 INFO]: Ticked Chunks: 43 [13:15:14 INFO]: Ticked Machines: 199 [13:15:14 INFO]: Skipped Machines: 457 [13:15:14 INFO]: [13:15:14 INFO]: Ticking Machines: [13:15:14 INFO]: BARREL_GIGANTIC - 16x (2ms) [13:15:14 INFO]: ELECTRIC_SMELTERY_2 - 6x (1ms) [13:15:14 INFO]: ENERGY_REGULATOR - 4x (2ms) [13:15:14 INFO]: + 24 Hidden [13:15:14 INFO]: [13:15:14 INFO]: Ticking Chunks: [13:15:14 INFO]: &cworld;Chunk;39;27 - 1x &7(1ms) [13:15:14 INFO]: &cworld;Chunk;-364;-2028 - 4x &7(1ms) [13:15:14 INFO]: &cworld;Chunk;32;35 - 41x &7(1ms) [13:15:14 INFO]: &cworld;Chunk;32;37 - 4x &7(1ms) [13:15:14 INFO]: &cworld;Chunk;-1401;76 - 1x &7(1ms)

mrcoffee1026 commented 5 years ago

Testing and timings, while still moderately worse than build 278, lag and tps on the server is remarkably improved in build 307 (nearer 24% of resources instead of prior 96%). I would consider this issue resolved at this point.