CitizensDev / Citizens2

Citizens - the premier plugin and API for creating server-side NPCs in Minecraft.
https://citizensnpcs.co
Open Software License 3.0
589 stars 313 forks source link

Memory leak with Anonymous NPC Registry #2022

Closed haloflooder closed 4 years ago

haloflooder commented 4 years ago

The output of command /version on my server is: Paper version 1618 (1.12.2) The output of command /version citizens on my server is: 2.0.24

Explanation

We are currently experiencing memory leaks on a server using Citizens with a custom plugin we created. With the custom plugin, we are creating an anonymous NPC registry (CitizensAPI.createAnonymousNPCRegistry(someDataStoreVar)) and creating temporary npc's with it. After the NPC is no longer in use, we deregister the npc from the anonymous registry.

After about 30 minutes on a test server. As a test, we are registering and deregistering about 9 NPC's every 500 milliseconds. We took a heap dump of the server's java process and found that the NPC's are still in memory and the size grew about 1GB with EntityHumanNPC$PlayerNPC

Before 30 minutes

Before 30 minutes Before 30 minutes

After 30 minutes

After 30 minutes After 30 minutes

More details about the test

We create a new anon registry using the CitizensAPI with a new MemoryNPCDataStore object.

Then we create 9 NPC's with the registry every 500 milliseconds after deregistering all the NPC's.

For deregistering the NPC's. We tried 2 different methods to destroy them. We tried iterating through all the npc's in the registry to destroy them manually and also tried to use the deregisterAll() method. Both attempt of trying to remove the npc from the registry still causes the memory leak. We also tried to despawn them before destroying/deregistering the npc from the registry but it gives us the same result.

As a control. We rebooted the test server w/o the custom plugin and left the server running around the same time period. The memory usage didn't go up.

Control (Before 20 minutes)

Before 20 minutes

Control (After 20 minutes)

Before 20 minutes

SXRWahrheit commented 4 years ago

2.0.24 is not the output of a /version Citizens command. Nevertheless, I can tell from the incomplete version string that your version of Citizens is severely outdated, and the data is therefore likely to be deemed irrelevant.

mcmonkey4eva commented 4 years ago

Gonna need to update to a remotely modern version of Citizens.

Current builds can be found at https://ci.citizensnpcs.co/job/Citizens2/

Also, you'll need to run a reasonably support minecraft version. This means at least 1.13.x, ideally 1.15.x in particular. 1.12 is very outdated and is not the focus of any development effort.

If you can replicate issues on a modern Citizens build paired with a modern Minecraft version, please update the relevant data, provide code pastes, post your actual /version outputs, and reply below so we can reopen the issue.

Also please do not censor any included data when doing so, esp. if it's likely to be important information as the censored lines of the above images are.

haloflooder commented 4 years ago

Hello, sorry about that. I have updated Citizens on the test server to the latest version I got from the jenkins build server (2.0.25 snapshot build 1808). Citizens Version

Server version: Server version

We are not able to update to the latest version of minecraft due to major incompatibilities to the gamemode we are running. If anything, I can also run the same test on a more recent version of Minecraft. And about the censored info in the data, they're just private plugins that the owners wanted me to censor for some reason. I can assure you that they're not related to the issue we're having with Citizens since the private plugins censored in the images don't use NPC's from Citizens.

I re-ran the test and tested for about an hour. Got the same results as the older build.

Before 1 hour

Before 1 hour Before 1 hour

After 1 hour

After 1 hour After 1 hour

SXRWahrheit commented 4 years ago

Does the issue occur:

1) Without these other plugins? 2) On a modern version of Minecraft?

If either of these is not true, it sounds like Citizens is not the cause of the leak.

haloflooder commented 4 years ago

I'll go run the test on a fresh version of the latest version of Paper 1.15 with a bare minimal amount of plugins required.

mcmonkey4eva commented 4 years ago

Your methodology description, and the specific class that's being listed, looks a lot like you're not even despawning the NPCs. (Deregistering and despawning are NOT the same).

fullwall commented 4 years ago

What’s referencing the NPCs in the heap?

haloflooder commented 4 years ago

@mcmonkey4eva We tried despawning the NPC's then destroying them as well before using the deregisterAll method in a previous test before I created this issue. Also pretty sure the deregisterAll method despawns the NPC's as well judging by the code in the repo.

@fullwall I have posted an image below with what's referencing NPC's. (below the before and after section)

Did another test using the latest version of 1.15.1 from papermc's site.

Whatever versions Plugins

Note about the heap dumps

Just in case someone mentions that I should GC first before doing the heap dump. For every heap dumps I have taken so far. I have been GC'ing the server right before a heap dump to get rid of the garbage first.

Testing method

We are using this plugin to test for memory leaks. If we are doing something severely wrong then do let us know! We were just reporting this issue after trying different methods of handling with destroying/deregistering npc's.

Before 30 minutes

Before 30 minutes Before 30 minutes

After 30 minutes

After 30 minutes After 30 minutes

References (from 30 minutes after)

NPC References

mcmonkey4eva commented 4 years ago

That looks like the traits of the NPC are not being unregistered from the Bukkit event handler (or at least, the InventoryCloseEvent listener in the default Inventory trait isn't being unregistered).

This is normally done in destroy() here: https://github.com/CitizensDev/CitizensAPI/blob/6219cb2216332f77daaabce4554544c17cb58733/src/main/java/net/citizensnpcs/api/npc/AbstractNPC.java#L153-L157

The code you linked (for deregisterAll) clearly shows that destroy is not called, meaning the NPC is not properly cleared away (which, again, is expected: deregister is NOT proper removal of the NPC, it's removal from the registry's listing... if anything it's almost more weird that it includes despawning there). Your test code appears to use destroy() though, at least in "mode 1". Can you confirm whether the results are the same or not when specifically "mode" is set to "1"?

fullwall commented 4 years ago

Perhaps I should just unify the NPCRegistry#deregister code to do the extra Trait cleanup stuff

fullwall commented 4 years ago

I think the semantics are a bit unclear between the destroy() and deregisterAll() methods, and I haven't resolved the discrepancy, but now deregisterAll() deregisters trait events.

haloflooder commented 4 years ago

Yeah, we didn't know that using npc.destory() was the proper way to completely get rid of an NPC when handling our own registry. We initially just assumed deregister()/deregisterAll() would destroy the NPC as well. In the other plugin we initially had an issue with, we would attempt to destroy the NPC first then use the deregisterAll() method from the anon registry.

Anyways, we ran the test again with mode 1 enabled (destroy npc method). It looks like the memory leak is eliminated after running the test for about 20 minutes. But we had to fix some bugs that we encountered with the registry iterator.

After encountering the bug with NPCRegistry#iterator, now we know what might've been causing the memory leak in the first place. In the custom plugin we had initially had the issue with, we wrapped the destroying/deregistering part of the code with a try catch and it didn't throw any exceptions with the iterator for some reason.

But using similar code on the memory leak test plugin, we encountered a Concurrent Modifications exception with this code here which wasn't being caught by a try catch exception on our other plugin. We fixed the bug and the memory leak no longer persists on the test server. There seemed to be a memory leak due to iterating and destroying npc's in the iterator and the stack trace not being caught for some reason even though it should throw a Concurrent Modification Exception.

So the issue was caused by this bug that would normally throw this exception but didn't with the other plugin (which we now patched)

So basically we're doing this to fix the issue.

haloflooder commented 4 years ago

p.s. hopefully @mcmonkey4eva doesn't stab me next time

jackstrosahl commented 3 years ago

Is this related? Not exactly sure what's going on, looks like WorldEdit is interfering? Either way our server runs out of memory and crashes.

image
mcmonkey4eva commented 3 years ago

@jackstrosahl if you have a new issue, post a new issue, don't bump year-old issues