CitizensDev / Citizens2

Citizens - the premier plugin and API for creating server-side NPCs in Minecraft.
https://citizensnpcs.co
Open Software License 3.0
599 stars 316 forks source link

Memory leakage #1495

Closed SlimeDog closed 6 years ago

SlimeDog commented 6 years ago

Test server CentOS Linux release 7.5.1804 openjdk version "1.8.0_171" OpenJDK Runtime Environment (build 1.8.0_171-b10) OpenJDK 64-Bit Server VM (build 25.171-b10, mixed mode) Spigot 1.12.2 latest

90+ plugins, all latest dev builds, including Citizens and dependents:

I am trying to isolate memory leakage in plugins. The test server has 34 Citizens running across six worlds. To establish a baseline, there are no players connected (it's a test server).

Active plugins in this steady state:

With Citizens and dependents disabled, the server runs comfortably with maxheap=3G and simplified GC (-XX:+UseG1GC). For example, in the three days since last startup, allocated memory started at 2.3G and is now only 2.8G (so there is some small leakage elsewhere, but relatively insignificant). Free memory ranges between almost none and 2.2G, as GC is exercised.

With Citizens and dependents enabled, maxheap is quickly fully allocated regardless of setting (between 3G and 8G tested so far), and the server crashes within two days as free memory is consumed. Testing has also been performed with the Aikar-recommended GC settings, with no apparent difference to the outcome.

So it is certainly suggestive, but not conclusive, that the memory leaks are due to Citizens and dependents.

Initial questions:

SlimeDog commented 6 years ago

For reference, this is a typical Essentials memory listing, with Citizens and dependents disabled (alphabetized):

[16:00:01 INFO]: CONSOLE issued server command: /mem
[16:00:01 INFO]: Uptime: 3 days 1 minute 31 seconds
[16:00:01 INFO]: Current TPS = 20
[16:00:01 INFO]: Maximum memory: 3,072 MB.
[16:00:01 INFO]: Allocated memory: 2,769 MB.
[16:00:01 INFO]: Free memory: 601 MB.
[16:00:01 INFO]: World "world": 256 chunks, 16 entities, 7 tiles.
[16:00:01 INFO]: World "world_blackdog": 256 chunks, 92 entities, 54 tiles.
[16:00:01 INFO]: World "world_city": 260 chunks, 245 entities, 524 tiles.
[16:00:01 INFO]: World "world_lobby": 256 chunks, 44 entities, 177 tiles.
[16:00:01 INFO]: World "world_luckyblock": 256 chunks, 3 entities, 10 tiles.
[16:00:01 INFO]: World "world_minigames": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: World "world_parkour": 256 chunks, 7 entities, 8 tiles.
[16:00:01 INFO]: World "world_skyblock": 256 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: World "world_void": 256 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: World "world_whitedog": 265 chunks, 3 entities, 14 tiles.
[16:00:01 INFO]: World "world_yellowdog": 257 chunks, 109 entities, 5 tiles.
[16:00:01 INFO]: Nether "world_blackdog_nether": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: Nether "world_city_nether": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: Nether "world_nether": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: Nether "world_skyblock_nether": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: Nether "world_whitedog_nether": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: Nether "world_yellowdog_nether": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: The End "world_blackdog_the_end": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: The End "world_city_the_end": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: The End "world_skyblock_the_end": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: The End "world_the_end": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: The End "world_whitedog_the_end": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: The End "world_yellowdog_the_end": 0 chunks, 0 entities, 0 tiles.
mcmonkey4eva commented 6 years ago

Did you just... fork off from an existing issue, having the same issue as was mentioned in the original... but starting anew ignoring all posts and suggestions in the original issue?

If you want to be starting up a fork issue at least gather the data points from your own server as were requested from the original issue

SlimeDog commented 6 years ago

I wasn't sure that "extra objects" was the same as leakage. I didn't seem to have the same characteristics. I started as a comment on that issue (as you no doubt know) but decided to create a new issue, since it didn't seem the same on reflection. I tried to provide the data relevant to my experience. I don't see the data points that were requested in the original applying to my case. If I am mistaken, please advise with specifics.

If this is a duplicate, please advise and I will add the comments to the other issue.

mcmonkey4eva commented 6 years ago

A lot of my posts on #1476 referenced memory leaks very explicitly, and advised the usage of a java profiler to analyze them, which would be very useful here. A proper profiler should be able to pinpoint exactly what is using massive amounts of memory.

SlimeDog commented 6 years ago

I have HeapAnalyzer. Will that do?

mcmonkey4eva commented 6 years ago

Should work, yeah

SlimeDog commented 6 years ago

Would you like dumps every 12 hours or so, or just capture a crash?

mcmonkey4eva commented 6 years ago

Information gathered when it's about to crash (memory space almost all taken) would be ideal

SlimeDog commented 6 years ago

"About to crash" is rather a difficult mark to hit, but I will make the effort. The server is configured to create a heap dump on crash.

I would think that running the code with a flock of NPCs in a debugger (outside my abilities) would detect pretty quickly if Citizens is leaking or not. I would be happy to provide my configuration.

SlimeDog commented 6 years ago

Link to heap dump sent via Spigot PM.

As I read it with HeapAnalyzer, there is a lot of memory associated with Citizens, including 2G+ in thousands of entries in HashMap$Node elements (for 34 Citizens NPCs). I have no idea if this is expected, reasonable, or not-so-much.

mcmonkey4eva commented 6 years ago

Got the file @SlimeDog thanks!

The memory leak is from player advancement data attached to NPCs. Of course it is .-. @fullwall that advancement cleary thingy is going to need even more fixing!

Object path trace for reference: https://i.imgur.com/rA2F03L.png

SlimeDog commented 6 years ago

Thanks. So my work here is done? :)

I have more heap dumps if needed.

SlimeDog commented 6 years ago

Testing with citizens-2.0.22-b1542

SlimeDog commented 6 years ago

Configuration as above, except

The result is the same, except that it took only one day instead of two to crash. Link to new heap dump sent via Spigot PM.

Note: I did not test above 3G maxheap allocation.

mcmonkey4eva commented 6 years ago

You also have a massive number of NPC advancements that I'm working on tracing down atm... for reference on where they will be visible, they take a form like ./UNIVERSE/world/advancements/91974dcd-290f-21b1-a036-0bf8790eb9aa.json (that's an example of one specific entry in your heap dump). I imagine that folder is flooded with files if they've been saving to disk, and if so might help to clear that out (or at least all version 2 uuids... that is, ids where the 3rd block - at 21b1 in that example - starts with a 2. For comparison, version 4 means a valid player and version 3 means an offline-mode player).

@fullwall

From what I can see: NMS class CriterionTriggers contains static references to CriterionTrigger implementations. Each implementation class is very helpfully retaining a full list of every AdvancementDataPlayer instance it receives (in a map).

I think that pathway specifically needs to be cleared of data.

Tracing is similar to before: AdvancementProgress instances contained within the maps held by AdvancementDataPlayer which is held underneath the CriterionTriggers mess. They have maps of the style Map<AdvancementDataPlayer, CriterionTriggerKilled.a> The key is a reference in itself, and the .a subtyped value contains advancement data itself as well.

The most expensive CriterionTriggers are CriterionTriggerInventoryChanged and CriterionTriggerRecipeUnlocked for reference. Those contain 35% and then 32% of data waste, the next 5 down are 4%, 2x 3%, 2x 1% ... the rest are all extremely small usage. The lowest about 10 of them are all at 6174 instances, which I think means NPCs haven't ever triggered these. It's weird ones, like CriterionTriggerVillagerTrade.

SlimeDog commented 6 years ago

None of the NPC advancements appears to have been saved to disk. ./UNIVERSE/world/advancements/ contains only files representing valid player UUIDs.

Remember that there are no players connected to either test server, so the NPCs are not interacting with players. The sentinels are configured to chase and kill mobs who venture too close to the sentinel guard areas, some of which are active since they are located in spawn chunks. The likely "live" sentinels are: uuid: 57523cee-360d-4468-a0fa-46949dfbb3fe uuid: c1545fb2-7dec-4f58-92a5-7aeae1c88681 uuid: 1e83c41e-14da-4406-bf68-d7d32e733103 uuid: 655b05fd-05ca-45ba-bc67-9b710b57d26d uuid: 5d5f82ee-d194-4cac-be56-0ca988ff5005 uuid: 519c565b-e97a-4d84-ac8d-79b4fcba6b1f uuid: ee27c9ae-d99e-4cbf-a50e-14dfd0656c57 uuid: d3fab83b-3eb3-4bf9-9d0d-76a56bc26f40 uuid: e9b4024e-dea6-479b-bc1b-14ea1a8f6881 uuid: 2996f32a-f496-43f0-816a-a8e4c084155e uuid: 267d1b95-604a-4202-a7e1-52a0d783cf9f uuid: 62fe6737-fffe-4170-b58f-43aeac0dd8d2 uuid: 02b04c9e-0a6d-4b3d-94c2-d6d25718eaec uuid: 332284fe-2a88-4af1-8861-79447471975b uuid: 8d73c818-6edb-4910-a729-840291aed4e1

fullwall commented 6 years ago

@mcmonkey4eva that is what PlayerAdvancementData.a() is meant to clear (although I admit I haven't gone through all the CriterionTriggers). Also if it was a CriterionTrigger it would show up as a large groups of objects in the heap, no?

mcmonkey4eva commented 6 years ago

@SlimeDog why are all those UUID's holding a player version ID (the 4 digit at the start of each third grouping)? They are not valid player UUIDs, and I don't believe NPCs should be holding player versioned UUIDs.

@fullwall Er, looked at the clearing code... Class: net.citizensnpcs.nms.v1_12_R1.util.EmptyAdvancementDataPlayer Method present:

    @Override
    public void a() {
    }

The clear call:

this.a();

See the problem there, right? :P I'm guessing either that should be base.a(); or the a() override should be removed...

SlimeDog commented 6 years ago

I have no idea. They were created by Citizens, using either

/npc create ...

or

/npc copy

I just copied the data from Citizens/saves.yml. All of the NPC UUIDs in that file have a "4" at the start of the third element, not just the sentinels.

SlimeDog commented 6 years ago

Restarted tests with

SlimeDog commented 6 years ago

After approx 7 hours, configured 3G memory allocated, but lots of free memory (cyclically as GC does its magic). How much memory should I expect to be allocated per Citizen NPC?

I will test with higher configured memory after this test runs for a few days.

mcmonkey4eva commented 6 years ago

How much memory should I expect to be allocated per Citizen NPC? It's hard to specify due to there being so many potential variables involved in that, but if the NPCs are not doing much they shouldn't be using much of any RAM. If no leaks occur (no memory building up over time) then we're probably good on that front.

SlimeDog commented 6 years ago

Understood. Based on ^^^

With Citizens and dependents disabled ... 
allocated memory started at 2.3G and is now only 2.8G [after three days] ...

With Citizens and dependents in the mix, allocated memory quickly rises to the 3G configured. So I was wondering if maybe maxheap should be set higher. I'll try 4G in a subsequent test, to see if it doesn't fill up at a higher level.

SlimeDog commented 6 years ago

Test with

After just over one day, similar results:

java.lang.OutOfMemoryError: Java heap space

New pre-crash heap dump link sent via PM. Post-crash heap dump was not created.

mcmonkey4eva commented 6 years ago

Three part mega post:


About 1.7 GB of data is retained by AdvancementDataPlayer instances. 5588 instances of that type are retained, each one of which uses exactly 306360 bytes. Each one has its weight mostly occupied by the contained LinkedHashMap in field data, which has a preallocation capacity of 2048 objects. Each AdvancementDataPlayer holds within it a reference to a different NPC entity. There are 5588 unique EntityHumanNPC instances being retained. Note for reference there are only 34 valid NPCs present at time of heap dump. These AdvancementDataPlayer objects are all being retained by the CriterionTriggers pathway mentioned previously.

If we look at CriterionTriggerInventoryChanged, it contains field b, the map mentioned previously containing all the AdvancementDataPlayer instances. Looking at this in the heapdump, it has capacity 8192, and used size 5588. Each used slot is an AdvancementDataPlayer (per the map definition, mentioned in prior post). The type of the instance contained is net.minecraft.server.v1_12_R1.AdvancementDataPlayer. It, notably, is not the Citizens2 class EmptyAdvancementDataPlayer which contains the self-clearing logic.


@fullwall my guess for the next issue to patch in this work here is to more properly ensure NPCs get the Empty advancement handler, not the standard NMS advancement handler.

EntityHumanNPC in Citizens2 source sets this in the private initialise method, with a reflection set call on the internal private final field bY. This is initially set by Minecraft with minecraftserver.getPlayerList().h(this) in the EntityPlayer constructor. This call grabs or builds the AdvancementDataPlayer NMS object for the entity, and prepares it entirely. It is not initialized any further outside of the initial set call. My suspicious is that even though C2 overwrites the field containing it on the EntityPlayer instance, the original is already uploaded to the CriterionTriggers maps, and will need to be removed from there when being replaced. Meaning: inside the initialise method of EntityHumanNPC, immediately before the call beginning ADVANCEMENT_DATA_PLAYER.set, there should be an equivalent get call to acquire the original value, and clear it from backing maps (by calling a() on it).


A secondary issue that may be out of band to solve here but might cause some future headache for you, @SlimeDog - me.lucko.luckperms.bukkit.model.server.LPSubscriptionMap is a special type retained by LuckPerms as its official way of implementing a permissions system entirely incorrectly (terrifyingly wrongly) and is, as a side effect of its stupid implementation, retaining all NPCs within it, in a massive pile of nonsense maps. It essentially has a list of every permission string it knows about, and each perm value has a map of player entities to a boolean value object (true/false whether they have the permission). All 5588 player entities and then also some more somehow (5614 entries. Possibly a few of non-player types. Maybe perm groups or similar are stuck in there as well) are retained. Luckily that ends up only being 60MB of space wasted by that nonsense directly in this dump (though that number is likely lower than reality, as it discounts all memory also retained by other memory leaks. A few hundred MB are in use by the random outdated NPC entity objects that are being held as a leak).

SlimeDog commented 6 years ago

Thanks for the insight. I passed it along (with a few politic elisions) to the LuckPerms team.

mcmonkey4eva commented 6 years ago

Actually, looking at that bit closer: It might be an issue with the way Bukkit itself handles permissions (and Citizens not compensating for Bukkit being so stupid with perms), LuckPerms was just caught in the middle for being your permissions plugin, and isn't actually at fault.

SlimeDog commented 6 years ago

Thanks for the note on LuckPerms. I closed the ticket.

mcmonkey4eva commented 6 years ago

@fullwall as for the permissions issue: EntityHumanNPC class defines a contained class PlayerNPC, that class has access to protected field perm... I believe that needs to be used in call perm.clearPermissions();, which should unregister the NPC from permissions subscriptions in Bukkit's nonsense system.

SlimeDog commented 6 years ago

Restarted tests with 4G maxheap and

UPDATE: 4G heap filled in 5 hours on the server with mobs allowed at world spawn, but not the other. There is plenty of free memory on both at 14 hours. I will leave the tests running for a few days. But it looks like memory/GC management is much improved.

SlimeDog commented 6 years ago

24 hours into the latest test, allocated memory and free memory are stable on both servers (within the range of expected GC expansion/contraction). I'll leave the tests running for a few days, but it looks like the latest changes have done the trick, for my mix of Citizens NPCs. 👍

AIphaDragon commented 6 years ago

Hey. Do you test build #1545 on an empty server or with players? Thanks for the answer.

SlimeDog commented 6 years ago

Empty of players.

SlimeDog commented 6 years ago

Based on 2+ days of responsible memory/GC management, I think this issue is resolved. I will wait a day for additional comments and testing, then close.

AIphaDragon commented 6 years ago

I tested it on an empty server for 2 days. Everything works well. For complete certainty, tests on the server with players are still needed. Do you plan to test in the near future on your server with the players? Thank you in advance for your response.

SlimeDog commented 6 years ago

I cannot test on production server at present. Someone else will have to test with player load.

AIphaDragon commented 6 years ago

We tested the plugin. Online: 30 people. Work time: 18 hours. Everything is stable. The problem is fixed.

SlimeDog commented 6 years ago

Thanks @AIphaDragon. Closing...

SlimeDog commented 6 years ago

More data: 15 days up, no issues. Awesome!