WaterfallMC / Waterfall-Old

[UNMAINTAINED] Waterfall prior to becoming patch-based, see the project website at
https://papermc.io
Other
35 stars 12 forks source link

OutOfMemoryError: Direct buffer memory #53

Closed BlackBeltPanda closed 8 years ago

BlackBeltPanda commented 8 years ago

Hello,

Running:

Bungee Config: http://pastebin.com/raw/6dw4Wv5R

I've been having issues with running out of memory with Bungeecord, so I decided to switch to Waterfall to see if that would help, but the problem still occurs. I've tried allocating anywhere from 512MB to 2GB of RAM, but the memory exception still occurs even with anywhere from 3 to 20 players online with no plugins running. The errors will occur and players will have trouble joining for a few hours before the proxy just starts to hang and nothing responds anymore.

Not sure how much it helps, but I tried running VisualVM to get more information: With -xmx set to 2GB:

With -xmx set to 1GB:

"free -m" with proxy and Spigot servers unloaded: https://i.gyazo.com/fd0500751ba608f550c3e09eb61b9f8b.png

I've tried these flags: "{JAVA}" -Xmx{MAX_MEMORY}M -Xms{MAX_MEMORY}M -Djline.terminal=jline.UnsupportedTerminal -jar "{JAR}" nogui

And these flags: "{JAVA}" -Xmx{MAX_MEMORY}M -Xms{MAX_MEMORY}M -XX:+UseG1GC -XX:+UnlockExperimentalVMOptions -XX:MaxGCPauseMillis=100 -XX:+DisableExplicitGC -XX:TargetSurvivorRatio=90 -XX:G1NewSizePercent=50 -XX:G1MaxNewSizePercent=80 -XX:InitiatingHeapOccupancyPercent=10 -XX:G1MixedGCLiveThresholdPercent=50 -XX:+AggressiveOpts -XX:+AlwaysPreTouch -Dwaterfall.acceptInvalidPackets=true -Djline.terminal=jline.UnsupportedTerminal -jar "{JAR}" nogui

Techcable commented 8 years ago

Try to add -Dio.netty.recycler.maxCapacity=0. If that doesn't work, give me the info with -Dio.netty.leakDetectionLevel=paranoid.

BlackBeltPanda commented 8 years ago

With -Dio.netty.recycler.maxCapacity=0 and 512MB allocated (with plugins), the server immediately stopped responding after starting up and threw these errors: http://pastebin.com/raw/iAJK0Aja With 1GB allocated and no plugins, the Direct Buffer OOM error started to occur after a couple minutes. Log: http://pastebin.com/raw/mfRdvWEL

With -Dio.netty.leakDetectionLevel=paranoid, should I be seeing extra info somewhere? Nothing new in console and players are unable to connect but used heap seems to have increased a lot: https://i.gyazo.com/739adc9b8845ab94381a1b39851913ef.png Here's the log: http://pastebin.com/raw/h83EJ1Ta

BlackBeltPanda commented 8 years ago

Update: With -Dio.netty.recycler.maxCapacity=0 and 2GB allocated, the server went about 3 hours before throwing an OOM error. This time it only threw a few around the same time and kept running without errors for about 6 hours. Log: http://pastebin.com/raw/mE0htVnD 2GB seems overkill for 15 players and a couple plugins, though.

Techcable commented 8 years ago

Try with -Dio.netty.recycler.maxCapacity=0, 2GB, and paranoid leak detection.

BlackBeltPanda commented 8 years ago

With -Dio.netty.recycler.maxCapacity=0, -Dio.netty.leakDetectionLevel=paranoid, and 2GB allocated players are unable to join any server on the network without being disconnected immediately or shortly after joining with "Timed Out". The TPS on the non-proxy servers drops to about 10-15 when players try to join. Tried with and without plugins loaded. Pinging the server also becomes very unreliable; either slow or no response.

Log with plugins: http://pastebin.com/raw/JrnrQKpJ Log without plugins: http://pastebin.com/raw/L5gZbDx6

Also saw some strange things with "/glist" before being timed out: https://i.gyazo.com/e611bb45cdf851f06795b3d6f5feb688.png

BlackBeltPanda commented 8 years ago

Quick update: Not sure if the proxy's supposed to use 3.4GB more than what's allocated, but that's what it's doing: https://i.gyazo.com/99c633e7ac10a0779132cbeaf23f337f.png

This is with 2GB allocated, -Dio.netty.recycler.maxCapacity=0, and plugins running.

Techcable commented 8 years ago

Start waterfall with 2GB and -Dio.netty.recycler.maxCapacity=0. Configure the JVM to do a heap dump when it runs out of memory by adding -XX:+HeapDumpOnOutOfMemoryError. Run your server for as long as possible until you run out of memory. Then give me the resulting heap dump.

This looks like a memory leak.

BlackBeltPanda commented 8 years ago

Would it be better to give it 1GB to trigger the heap dump faster? Or is 2GB allocated preferred?

Techcable commented 8 years ago

Sure, we can try 1 GB. Also, when you do upload, please compress it and post the sha256 hash of the (compressed) file.

BlackBeltPanda commented 8 years ago

Well, this is interesting.

I didn't get any OOM errors with 1GB, so I reduced it to 512MB. Still no OOM errors.

What I think might have happened: Since the 1.9 update, my players and I have been getting really long "Downloading Terrain" screens when logging into the server. I asked on the Aquifer forums about this and one person noted they had the same issue and solved it by deleting the main world's "scoreboard.dat" file. I gave it a shot and it solved the issue; players were able to login instantly without that long "Downloading Terrain" screen.

Going back through my logs, it seems the OOM errors stopped when I fixed that problem. I believe they were related to the scoreboard.dat file, either being too large or being pre-1.9, in the server they were trying to connect to.

So right now I'm running smoothly with 512MB allocated to the proxy and plugins running. I still have the -XX:+HeapDumpOnOutOfMemoryError flag set, so I'll post back if it does throw another OOM error. So far it's been running a good 12+ hours with no errors, though. =)

BlackBeltPanda commented 8 years ago

This just started happening again. Here's the heap dump: https://www.dropbox.com/s/0hdxli1a6a5a2hu/WaterfallDump.7z?dl=0 Here's the SHA-256 hash: 936ABE35DF756433B4CF7F3E7E9DFA1846D712A0EEE570FBA7F89FC1AFDA8AA8

I noticed the scoreboard.dat file in one of my servers had grown again to about 3MB and players were getting long "Downloading Terrain" screens when trying to connect. I've deleted the file again and am waiting to see if it stops Waterfall from running out of memory like it did before.

Techcable commented 8 years ago

Shit, -Dio.netty.recycler.maxCapacity=0 disables the cap on the recycler. Set it to 10,000 and report again with a dump if you run out of memory. I'm coming close to the solution, but want to make sure its not caused by my own stupidity with the recycler.

BlackBeltPanda commented 8 years ago

I changed the recycler maxCapacity to 10,000. I'll have to wait for the scoreboard.dat file to grow again, I think, before it'll run out of memory. Will post back once that happens; I may just replace it with an older scoreboard.dat I saved that's ~3MB to speed it up.

Techcable commented 8 years ago

Fixed by 7b615e4be5427a5019984c8ce7283175b0888c3e. Plugins like featherboard fill scoreboard.dat with crap, and the data is sent to waterfall, and waterfall represents it inefficiently. If this is still an issue after this commit, open a new ticket.

Janmm14 commented 8 years ago

@BlackBeltPanda If you have java 8 >=update 20, you can try out using these additional jvm parameters: -XX:+UseG1GC -XX:+UseStringDeduplication, and if you want some statistics, append also this parameter: -XX:+PrintStringDeduplicationStatistics

Techcable commented 8 years ago

@Janmm14 I want to see if my interning works ;) We can't rely on command line flags to fix memory bugs.

Janmm14 commented 8 years ago

Yes, I know.

kamcio96 commented 8 years ago

String.intern?

BlackBeltPanda commented 8 years ago

Was the patch removed?

minecrafter commented 8 years ago

No