Closed SlimeDog closed 6 years ago
For reference, this is a typical Essentials memory listing, with Citizens and dependents disabled (alphabetized):
[16:00:01 INFO]: CONSOLE issued server command: /mem
[16:00:01 INFO]: Uptime: 3 days 1 minute 31 seconds
[16:00:01 INFO]: Current TPS = 20
[16:00:01 INFO]: Maximum memory: 3,072 MB.
[16:00:01 INFO]: Allocated memory: 2,769 MB.
[16:00:01 INFO]: Free memory: 601 MB.
[16:00:01 INFO]: World "world": 256 chunks, 16 entities, 7 tiles.
[16:00:01 INFO]: World "world_blackdog": 256 chunks, 92 entities, 54 tiles.
[16:00:01 INFO]: World "world_city": 260 chunks, 245 entities, 524 tiles.
[16:00:01 INFO]: World "world_lobby": 256 chunks, 44 entities, 177 tiles.
[16:00:01 INFO]: World "world_luckyblock": 256 chunks, 3 entities, 10 tiles.
[16:00:01 INFO]: World "world_minigames": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: World "world_parkour": 256 chunks, 7 entities, 8 tiles.
[16:00:01 INFO]: World "world_skyblock": 256 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: World "world_void": 256 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: World "world_whitedog": 265 chunks, 3 entities, 14 tiles.
[16:00:01 INFO]: World "world_yellowdog": 257 chunks, 109 entities, 5 tiles.
[16:00:01 INFO]: Nether "world_blackdog_nether": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: Nether "world_city_nether": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: Nether "world_nether": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: Nether "world_skyblock_nether": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: Nether "world_whitedog_nether": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: Nether "world_yellowdog_nether": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: The End "world_blackdog_the_end": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: The End "world_city_the_end": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: The End "world_skyblock_the_end": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: The End "world_the_end": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: The End "world_whitedog_the_end": 0 chunks, 0 entities, 0 tiles.
[16:00:01 INFO]: The End "world_yellowdog_the_end": 0 chunks, 0 entities, 0 tiles.
Did you just... fork off from an existing issue, having the same issue as was mentioned in the original... but starting anew ignoring all posts and suggestions in the original issue?
If you want to be starting up a fork issue at least gather the data points from your own server as were requested from the original issue
I wasn't sure that "extra objects" was the same as leakage. I didn't seem to have the same characteristics. I started as a comment on that issue (as you no doubt know) but decided to create a new issue, since it didn't seem the same on reflection. I tried to provide the data relevant to my experience. I don't see the data points that were requested in the original applying to my case. If I am mistaken, please advise with specifics.
If this is a duplicate, please advise and I will add the comments to the other issue.
A lot of my posts on #1476 referenced memory leaks very explicitly, and advised the usage of a java profiler to analyze them, which would be very useful here. A proper profiler should be able to pinpoint exactly what is using massive amounts of memory.
I have HeapAnalyzer. Will that do?
Should work, yeah
Would you like dumps every 12 hours or so, or just capture a crash?
Information gathered when it's about to crash (memory space almost all taken) would be ideal
"About to crash" is rather a difficult mark to hit, but I will make the effort. The server is configured to create a heap dump on crash.
I would think that running the code with a flock of NPCs in a debugger (outside my abilities) would detect pretty quickly if Citizens is leaking or not. I would be happy to provide my configuration.
Link to heap dump sent via Spigot PM.
As I read it with HeapAnalyzer, there is a lot of memory associated with Citizens, including 2G+ in thousands of entries in HashMap$Node elements (for 34 Citizens NPCs). I have no idea if this is expected, reasonable, or not-so-much.
Got the file @SlimeDog thanks!
The memory leak is from player advancement data attached to NPCs. Of course it is .-. @fullwall that advancement cleary thingy is going to need even more fixing!
Object path trace for reference: https://i.imgur.com/rA2F03L.png
Thanks. So my work here is done? :)
I have more heap dumps if needed.
Testing with citizens-2.0.22-b1542
Configuration as above, except
The result is the same, except that it took only one day instead of two to crash. Link to new heap dump sent via Spigot PM.
Note: I did not test above 3G maxheap allocation.
You also have a massive number of NPC advancements that I'm working on tracing down atm... for reference on where they will be visible, they take a form like ./UNIVERSE/world/advancements/91974dcd-290f-21b1-a036-0bf8790eb9aa.json
(that's an example of one specific entry in your heap dump). I imagine that folder is flooded with files if they've been saving to disk, and if so might help to clear that out (or at least all version 2 uuids... that is, ids where the 3rd block - at 21b1
in that example - starts with a 2. For comparison, version 4 means a valid player and version 3 means an offline-mode player).
@fullwall
From what I can see:
NMS class CriterionTriggers
contains static references to CriterionTrigger
implementations.
Each implementation class is very helpfully retaining a full list of every AdvancementDataPlayer
instance it receives (in a map).
I think that pathway specifically needs to be cleared of data.
Tracing is similar to before: AdvancementProgress instances contained within the maps held by AdvancementDataPlayer which is held underneath the CriterionTriggers mess.
They have maps of the style Map<AdvancementDataPlayer, CriterionTriggerKilled.a>
The key is a reference in itself, and the .a
subtyped value contains advancement data itself as well.
The most expensive CriterionTriggers are CriterionTriggerInventoryChanged
and CriterionTriggerRecipeUnlocked
for reference. Those contain 35% and then 32% of data waste, the next 5 down are 4%, 2x 3%, 2x 1% ... the rest are all extremely small usage. The lowest about 10 of them are all at 6174 instances, which I think means NPCs haven't ever triggered these. It's weird ones, like CriterionTriggerVillagerTrade
.
None of the NPC advancements appears to have been saved to disk. ./UNIVERSE/world/advancements/ contains only files representing valid player UUIDs.
Remember that there are no players connected to either test server, so the NPCs are not interacting with players. The sentinels are configured to chase and kill mobs who venture too close to the sentinel guard areas, some of which are active since they are located in spawn chunks. The likely "live" sentinels are: uuid: 57523cee-360d-4468-a0fa-46949dfbb3fe uuid: c1545fb2-7dec-4f58-92a5-7aeae1c88681 uuid: 1e83c41e-14da-4406-bf68-d7d32e733103 uuid: 655b05fd-05ca-45ba-bc67-9b710b57d26d uuid: 5d5f82ee-d194-4cac-be56-0ca988ff5005 uuid: 519c565b-e97a-4d84-ac8d-79b4fcba6b1f uuid: ee27c9ae-d99e-4cbf-a50e-14dfd0656c57 uuid: d3fab83b-3eb3-4bf9-9d0d-76a56bc26f40 uuid: e9b4024e-dea6-479b-bc1b-14ea1a8f6881 uuid: 2996f32a-f496-43f0-816a-a8e4c084155e uuid: 267d1b95-604a-4202-a7e1-52a0d783cf9f uuid: 62fe6737-fffe-4170-b58f-43aeac0dd8d2 uuid: 02b04c9e-0a6d-4b3d-94c2-d6d25718eaec uuid: 332284fe-2a88-4af1-8861-79447471975b uuid: 8d73c818-6edb-4910-a729-840291aed4e1
@mcmonkey4eva that is what PlayerAdvancementData.a() is meant to clear (although I admit I haven't gone through all the CriterionTriggers). Also if it was a CriterionTrigger it would show up as a large groups of objects in the heap, no?
@SlimeDog why are all those UUID's holding a player version ID (the 4
digit at the start of each third grouping)? They are not valid player UUIDs, and I don't believe NPCs should be holding player versioned UUIDs.
@fullwall Er, looked at the clearing code...
Class: net.citizensnpcs.nms.v1_12_R1.util.EmptyAdvancementDataPlayer
Method present:
@Override
public void a() {
}
The clear call:
this.a();
See the problem there, right? :P
I'm guessing either that should be base.a();
or the a()
override should be removed...
I have no idea. They were created by Citizens, using either
/npc create ...
or
/npc copy
I just copied the data from Citizens/saves.yml. All of the NPC UUIDs in that file have a "4" at the start of the third element, not just the sentinels.
Restarted tests with
After approx 7 hours, configured 3G memory allocated, but lots of free memory (cyclically as GC does its magic). How much memory should I expect to be allocated per Citizen NPC?
I will test with higher configured memory after this test runs for a few days.
How much memory should I expect to be allocated per Citizen NPC?
It's hard to specify due to there being so many potential variables involved in that, but if the NPCs are not doing much they shouldn't be using much of any RAM.
If no leaks occur (no memory building up over time) then we're probably good on that front.
Understood. Based on ^^^
With Citizens and dependents disabled ...
allocated memory started at 2.3G and is now only 2.8G [after three days] ...
With Citizens and dependents in the mix, allocated memory quickly rises to the 3G configured. So I was wondering if maybe maxheap should be set higher. I'll try 4G in a subsequent test, to see if it doesn't fill up at a higher level.
Test with
After just over one day, similar results:
java.lang.OutOfMemoryError: Java heap space
New pre-crash heap dump link sent via PM. Post-crash heap dump was not created.
Three part mega post:
About 1.7 GB of data is retained by AdvancementDataPlayer instances. 5588 instances of that type are retained, each one of which uses exactly 306360 bytes. Each one has its weight mostly occupied by the contained LinkedHashMap in field data
, which has a preallocation capacity of 2048 objects.
Each AdvancementDataPlayer holds within it a reference to a different NPC entity. There are 5588 unique EntityHumanNPC instances being retained.
Note for reference there are only 34 valid NPCs present at time of heap dump.
These AdvancementDataPlayer objects are all being retained by the CriterionTriggers pathway mentioned previously.
If we look at CriterionTriggerInventoryChanged
, it contains field b
, the map mentioned previously containing all the AdvancementDataPlayer instances. Looking at this in the heapdump, it has capacity 8192, and used size 5588. Each used slot is an AdvancementDataPlayer (per the map definition, mentioned in prior post).
The type of the instance contained is net.minecraft.server.v1_12_R1.AdvancementDataPlayer
.
It, notably, is not the Citizens2 class EmptyAdvancementDataPlayer
which contains the self-clearing logic.
@fullwall my guess for the next issue to patch in this work here is to more properly ensure NPCs get the Empty advancement handler, not the standard NMS advancement handler.
EntityHumanNPC
in Citizens2 source sets this in the private initialise
method, with a reflection set
call on the internal private final field bY
. This is initially set by Minecraft with minecraftserver.getPlayerList().h(this)
in the EntityPlayer constructor. This call grabs or builds the AdvancementDataPlayer
NMS object for the entity, and prepares it entirely. It is not initialized any further outside of the initial set call. My suspicious is that even though C2 overwrites the field containing it on the EntityPlayer instance, the original is already uploaded to the CriterionTriggers
maps, and will need to be removed from there when being replaced.
Meaning: inside the initialise
method of EntityHumanNPC
, immediately before the call beginning ADVANCEMENT_DATA_PLAYER.set
, there should be an equivalent get call to acquire the original value, and clear it from backing maps (by calling a()
on it).
A secondary issue that may be out of band to solve here but might cause some future headache for you, @SlimeDog -
me.lucko.luckperms.bukkit.model.server.LPSubscriptionMap
is a special type retained by LuckPerms as its official way of implementing a permissions system entirely incorrectly (terrifyingly wrongly) and is, as a side effect of its stupid implementation, retaining all NPCs within it, in a massive pile of nonsense maps.
It essentially has a list of every permission string it knows about, and each perm value has a map of player entities to a boolean value object (true/false whether they have the permission). All 5588 player entities and then also some more somehow (5614 entries. Possibly a few of non-player types. Maybe perm groups or similar are stuck in there as well) are retained. Luckily that ends up only being 60MB of space wasted by that nonsense directly in this dump (though that number is likely lower than reality, as it discounts all memory also retained by other memory leaks. A few hundred MB are in use by the random outdated NPC entity objects that are being held as a leak).
Thanks for the insight. I passed it along (with a few politic elisions) to the LuckPerms team.
Actually, looking at that bit closer: It might be an issue with the way Bukkit itself handles permissions (and Citizens not compensating for Bukkit being so stupid with perms), LuckPerms was just caught in the middle for being your permissions plugin, and isn't actually at fault.
Thanks for the note on LuckPerms. I closed the ticket.
@fullwall as for the permissions issue:
EntityHumanNPC
class defines a contained class PlayerNPC
, that class has access to protected field perm
... I believe that needs to be used in call perm.clearPermissions();
, which should unregister the NPC from permissions subscriptions in Bukkit's nonsense system.
Restarted tests with 4G maxheap and
UPDATE: 4G heap filled in 5 hours on the server with mobs allowed at world spawn, but not the other. There is plenty of free memory on both at 14 hours. I will leave the tests running for a few days. But it looks like memory/GC management is much improved.
24 hours into the latest test, allocated memory and free memory are stable on both servers (within the range of expected GC expansion/contraction). I'll leave the tests running for a few days, but it looks like the latest changes have done the trick, for my mix of Citizens NPCs. 👍
Hey. Do you test build #1545 on an empty server or with players? Thanks for the answer.
Empty of players.
Based on 2+ days of responsible memory/GC management, I think this issue is resolved. I will wait a day for additional comments and testing, then close.
I tested it on an empty server for 2 days. Everything works well. For complete certainty, tests on the server with players are still needed. Do you plan to test in the near future on your server with the players? Thank you in advance for your response.
I cannot test on production server at present. Someone else will have to test with player load.
We tested the plugin. Online: 30 people. Work time: 18 hours. Everything is stable. The problem is fixed.
Thanks @AIphaDragon. Closing...
More data: 15 days up, no issues. Awesome!
Test server CentOS Linux release 7.5.1804 openjdk version "1.8.0_171" OpenJDK Runtime Environment (build 1.8.0_171-b10) OpenJDK 64-Bit Server VM (build 25.171-b10, mixed mode) Spigot 1.12.2 latest
90+ plugins, all latest dev builds, including Citizens and dependents:
I am trying to isolate memory leakage in plugins. The test server has 34 Citizens running across six worlds. To establish a baseline, there are no players connected (it's a test server).
Active plugins in this steady state:
With Citizens and dependents disabled, the server runs comfortably with maxheap=3G and simplified GC (-XX:+UseG1GC). For example, in the three days since last startup, allocated memory started at 2.3G and is now only 2.8G (so there is some small leakage elsewhere, but relatively insignificant). Free memory ranges between almost none and 2.2G, as GC is exercised.
With Citizens and dependents enabled, maxheap is quickly fully allocated regardless of setting (between 3G and 8G tested so far), and the server crashes within two days as free memory is consumed. Testing has also been performed with the Aikar-recommended GC settings, with no apparent difference to the outcome.
So it is certainly suggestive, but not conclusive, that the memory leaks are due to Citizens and dependents.
Initial questions: