PluginBugs / Issues-ItemsAdder

Repository used to keep track of issues of my plugin ItemsAdder
https://itemsadder.devs.beer
51 stars 21 forks source link

Server crashes repeatedly on chunk generation copying _corrupted region_custom_blocks files #2194

Closed ofunny closed 1 year ago

ofunny commented 1 year ago

Terms

Discord tag (optional)

ofunny#1337

What happened?

I run a Paper server 1.19.2 Build 307, ItemsAdder 3.3.0 and Chunky 1.3.52 for world pregeneration. ItemsAdder is using the "Default packs 2.0.8" ore world populator config to generate custom ores.

Repeatedly Itemsadder crashed the server while chunk generation: https://pastebin.com/r1hsYq24 This is not a problem about Chunky, chunky just triggers the normal generation. It does not happen for all regions only certain regions.

I really need a fix here because otherwise I can not pregenerate my world and also can not trust the regular chunk generation on discovery.

Also what does that mean for the corrupted chunks? What will be the result, should I replace them an regenerate them again?

Steps to reproduce the issue

  1. Install Itemsadder
  2. Install the default pack with all ores
  3. Start the server and start chunk pregen with a plugin like Chunky
  4. wait and see

Server version

Paper server 1.19.2 Build 307

ItemsAdder Version

ItemsAdder 3.3.0

ProtocolLib Version

ProtocolLib version 5.0.0-SNAPSHOT-b608

LoneLibs Version

LoneLibs version 1.0.21

LightAPI Version (optional)

Not installed because not compatible with 1.19.2

LibsDisguises Version (optional)

No response

FULL server log

Only on request because of privacy concerns.

Error (optional)

https://pastebin.com/r1hsYq24

Problematic items yml configuration file (optional)

No response

Other files, you can drag and drop them here to upload. (optional)

No response

Screenshots/Videos (you can drag and drop files or paste links)

No response

ofunny commented 1 year ago

Any Feedback on this?

LoneDev6 commented 1 year ago

image

ofunny commented 1 year ago

Sorry, I usually set the expiration date to 2 weeks for public pastes, here is a new one: https://pastebin.com/Cwix5dwM I can also send an unlimited paste via dm in case you need it any longer. In this paste Squaremap is also generating but the crashing happens unrelated if Squaremap is running or not while pregeneration.

LoneDev6 commented 1 year ago

I added you to the beta testing team, in the next days I'll share a new jar with you which MIGHT fix the issue (as I cannot reproduce it)

LoneDev6 commented 1 year ago

Nvm. Somehow your server is corrupting ItemsAdder region files for some reason. Maybe chunky is generating chunks too fast and asynchronously?

ofunny commented 1 year ago

Well Chunky (https://www.spigotmc.org/resources/chunky.81534/) is triggering the generation as fast as possible, but it is not bypassing the regular server chunk generation. I'm using the latest Paper 1.19.2 build, as soon as all plugins catch up I will update to 1.19.3 – Paper has changed the system for chunk handling some time ago to do stuff more async (https://github.com/PaperMC/Paper/pull/8177), I'm not sure if that has anything to do with it.

I also noticed that corrupted ItemsAdder regions appear even on normal player exploration/generation, it happens hardly but happens – but in this case without server crash.

What does a corrupted region mean for me: Are all custom generated blocks in it not working or does IA fix that automatically?

LoneDev6 commented 1 year ago

This happen when somehow the region files saving gets interrupted. It usually happens on server crashes, manual server killing, full disk errors.

This is the first report about corruption in years (before some fixes on my end), so it means that there is something in your server causing that particular behaviour.

ofunny commented 1 year ago

Well what I can do, is removing all plugins except of Chunky and Itemsadder and than run it again. What I would need to know, what will be the result within a corrupted region for a player playing there – like what's the behavior?

LoneDev6 commented 1 year ago

ItemsAdder won't be able to recognize previously placed custom blocks as the region file cannot be read. They will act like vanilla blocks.

ofunny commented 1 year ago

Sorry, it took a long time for my response but I had to recover after Covid-19.

I ran different tests now, one with a 1.19.3 Paper server build 383 and hardly any plugins installed (I removed all unnecessary plugins for the test), all what's left is: ItemsAdder, LoneLibs, LuckPerms-Bukkit, ProtocolLib, (optional Chunky)

So there are no other plugins to blame left. Even with normal discovery (no Chunky pregen) I get corrupted regions, the only difference between getting a corrupted region while pregenerating and normal player discovery is that the server crashes when the exception appears while performing the pregeneration while it just throws the exception on normal discovery.

Here is the full log with pregen: https://mclo.gs/MkfmonI You will find two exceptions from ItemsAdder in it: Line 3835, https://mclo.gs/MkfmonI#L3835 (no crash), line 5715, https://mclo.gs/MkfmonI#L5715 (crash)

Crashreport: https://mclo.gs/ItuhzRG

Here is the same error without Chunky and normal player discovery: https://mclo.gs/j8WhpFx

As you see, there is no difference, also Chunky is not the cause and all that is left is Paper, ItemsAdder, LoneLibs and my world. The world has been started fresh (emtpy world) with the ItemsAdder Worldpopulator, only the spawn has been placed with FAWE. We use a rtp plugin on the live server so some unconnected areas (spots) have been discovered meanwhile all over the world (but everything happened after the fresh start of the world with the Worldpopulator).

Closing words The reason you probably couldn't replicate this bug is the time it takes for it to occur. It can take hours with active pregeneration until the exception will be thrown and the server crashes and even days with regular players discovering the world until an exception will be thrown and might easily be missed by admins because they will not see any immediate effect on the gameplay/server.

I can also share my whole test setup for you: Files, configs, server, world for replication. My server is a 16 core, 64GB, 3 TB, 2.5 Gb/s machine and there are no issues with space, ram, networking or CPU so that could also not cause the issue.

I would highly appreciate it if this could be fixed.

ofunny commented 1 year ago

I did verify the test results again by starting a completely fresh world (emptied the world folder) with the same setup as above – took 2 hours of chunk generation, same result. Its absolutely not an "other plugin bug" but Itemsadder, please reopen the issue, thank you!

LoneDev6 commented 1 year ago

Is the world very big? What is its width, length, height size?

ofunny commented 1 year ago

The world is a regular 1.19.3 vanilla overworld (y 319 to -64), limited via worldborder to100k (length and width) on the live server and on my test server I set it to 20k or 40k while testing. Pregen will stop at the worldborder, but the exception will be thrown much earlier, usually after only 1 to 3% pregeneration and also on regular discovery my players only explored less than +/- 5 to 10% and I already got 6 corrupted regions.

I do not know the implementation of ItemsAdder so I can not really help, but corrupted Itemsadder region files happen only within "region_custom_blocks" folder but not in liquids and various_block (at least until now) – most of the corrupted copied files are not empty and still exist in the "region_custom_blocks" folder, hardly those copied files are really empty.

I use all of the custom ores and blocks of the default pack, I did not change the chances and amount. This could be a coincidence, but I noticed that while testing with 1.19.2 (beginning of this issue report), the bug occurred a lot faster after pregen has started. On the current 1.19.3 build of Paper it takes a rather long time – but as I sad, could also just be a coincidence.

You mentioned async chunk creation – well as far as I know Paper does that since years and the have rewritten the Chunk system and improved the async loading lately so maybe there is a relation between this problem and the rewrite (https://github.com/PaperMC/Paper/pull/8177) but again I can only guess.

I also did the test on a completely new world to eliminate the possibility of corrupted existing chunks causing issues. And if I delete the corrupted region file from all 3 folders of Itemsadder, as well as regions, pois and entities, start the server again and visit the region, it just generates fine – so I assume that there is no problem with certain regions in particular but with the ItemsAdder generation if unknown circumstances happen at the same time.

Btw I use the 'worlds_populators_1_18.yml' but I think this config should also work for 1.19.x

One suggestion Can we ask other server owners with at least 1.19.x Paper server and custom ores of the default pack, what run the world for some time (with players discovering new chunks) to check out the "region_custom_blocks" folder for the corrupted directory? So we could confirm if that affects others as well but gets missed or ignored.

LoneDev6 commented 1 year ago

Yes somehow the regions get corrupted after writing lot of data. Can you let me know the size of some of your itemsadder custom blocks region files?

ofunny commented 1 year ago

Here is an archive of my "region_custom_blocks" from 2 days ago before I fixed the corrupted regions: https://resource.ofunny.world/dev/region_custom_blocks.zip

I could also provide the complete world if it would help.

LoneDev6 commented 1 year ago

Thanks a ton for this, it will speed up my job so much.

ofunny commented 1 year ago

You're welcome, I can provide anything you need, it also helps me to finish my updates and servers. I directly return the "thank you" for actually taking care of all the bugs and the support.

LoneDev6 commented 1 year ago

image

Trying to reproduce it without any success. I set a very large radius to let the server run without interruption. Let's see after some hours. It has been running for over 1 hour.

ofunny commented 1 year ago

I mean your generation rate is in about 2 to 2.5 times slower than on my server and it did take up to two hours for me – if you can not trigger it however, I could send my whole server files or even give you ssh access to a test installation.

If not replicable I can only logically try to find an explanation: Is there anything different in my ItemsAdder or Paper config compared to yours, I did change some minor things to ItemsAdder and a optimization to Paper including a minor reduction of incremental chunk writes per cycle but usually it should not be the cause, I could also resintall everything in default (Paper, Itemsadder, Packs) and test again.

https://resource.ofunny.world/dev/ItemsAdder.zip (removed)

The other option would be the hardware, but it is a blank Linux installation (no weird docker or panel running) on a KVM like most other servers with reasonable specs at a professional hosting company.

It's really weird to be honest. Please let me know when you got the file, because I would like to remove it again as download.

LoneDev6 commented 1 year ago

I didn't find any option to increase the generation rate, I guess it's bound to my PC CPU speed probably. I'll do another test with a dedicated server to see if I can trigger the issue. Anyway I'll share a beta jar with you with some important restructuring of the regions files which allow lower disk usage and possibly mitigate the issue completely.

ofunny commented 1 year ago

Yeah, there is no option in chunky (with incremental chunk writes I meant Paper). Chunky is primary limited to your CPU and secondary to your disk speed and memory. Chunky will run as fast as possible – I noted the chunk rate just to quantify the time, since my server could progress much more within the same time frame what means more chances for an exception.

But at that point I will just wait on the beta and test it – until than I will put that problem on hold. Thanks so far for your effort.

LoneDev6 commented 1 year ago

Today I completely rewrote the custom chunks files.

Warning

These results are not final and not updated, they do not reflect the final product performance.

This is a first test result while using chunky

CPU: i7 4790k

Process set as low priority to simulate slow servers hardware.

I let chunky generate a world with the DefaultPack minerals and this is the result.

Loaded region with 64743 blocks and 1019 chunks in 92ms The plugin loads more than 64k custom blocks in less than 100ms.

Some tests forcing fast saving of data into the file

This is a test with realistic custom blocks amount (500.000)

[00:56:18] ##### Testing with 500.000 custom blocks
[01:00:06 INFO]: ##### Test new format
[01:00:09 INFO]: new format - create array: 232
[01:00:09 INFO]: new format - save file: 845
[01:00:09 INFO]: new format - load file: 1331
[01:00:09 INFO]: new format - took 2409ms
[00:56:42] NBT Region file size: 1289KB
[01:00:09 INFO]: ##### Test nbt
[01:00:15 INFO]: NBT create array: 328
[01:00:15 INFO]: NBT save file: 4687
[01:00:15 INFO]: NBT load file: 752
[01:00:15 INFO]: NBT took 5769ms
[00:56:42] NBT Region file size: 1531KB

You can see the file:


This is a test with very insane blocks amount (3.000.000)

[00:56:18] ##### Testing with 3.000.000 custom blocks
[00:56:18] ##### Test new format
[00:56:29] new format - create array: 1159
[00:56:29] new format - save file: 3832
[00:56:29] new format - load file: 5999
[00:56:29] new format - took 10992ms
[00:56:42] NBT Region file size: 7726KB
[00:56:29] ##### Test nbt
[00:56:42] NBT create array: 1868
[00:56:42] NBT save file: 7207
[00:56:42] NBT load file: 4055
[00:56:42] NBT took 13132ms
[00:56:42] NBT Region file size: 9186KB

You can see the file:

ofunny commented 1 year ago

Looks good, just let me know when the beta is ready for me to test :)

LoneDev6 commented 1 year ago

A test with high process priority and realistic data: Loaded r.0.-1.pbin region with 51107 blocks and 934 chunks in 85ms It takes 0.09ms to process a vanilla chunk with ItemsAdder's DefaultPack ores.

I think it's a very good result, we just have to see if it's more stable than Mojang's NBT (used by the previous ItemsAdder versions).


Conversion takes approximately 6 seconds for 16 regions. They are 512x512 chunks - 8192x8192 blocks.

Should take ~30 seconds to convert a 40000x40000 world.

LoneDev6 commented 1 year ago

The new 3.4.0 release was tested intensively and I can confirm:

ofunny commented 1 year ago

I tested the 3.4.0 pre release on Paper as usual on a blank world and almost instantly got an

[22:06:17 ERROR]: [ItemsAdder] Error saving region file: /itemsadder/region/r.1.0.pb
[22:06:17 WARN]: java.util.ConcurrentModificationException
[22:06:17 WARN]:        at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1597)
[22:06:17 WARN]:        at java.base/java.util.HashMap$EntryIterator.next(HashMap.java:1630)
[22:06:17 WARN]:        at java.base/java.util.HashMap$EntryIterator.next(HashMap.java:1628)
[22:06:17 WARN]:        at ItemsAdder_3.4.0-pre-1.jar//ia.m.ip.a(SourceFile:173)
[22:06:17 WARN]:        at ItemsAdder_3.4.0-pre-1.jar//ia.m.ip.writeTo(SourceFile:111)
[22:06:17 WARN]:        at ItemsAdder_3.4.0-pre-1.jar//ia.m.cN.bs(SourceFile:706)
[22:06:17 WARN]:        at ItemsAdder_3.4.0-pre-1.jar//ia.m.cG.j(SourceFile:36)
[22:06:17 WARN]:        at ItemsAdder_3.4.0-pre-1.jar//ia.m.jL.run(SourceFile:108)
[22:06:17 WARN]:        at org.bukkit.craftbukkit.v1_19_R2.scheduler.CraftTask.run(CraftTask.java:101)
[22:06:17 WARN]:        at org.bukkit.craftbukkit.v1_19_R2.scheduler.CraftAsyncTask.run(CraftAsyncTask.java:57)
[22:06:17 WARN]:        at com.destroystokyo.paper.ServerSchedulerReportingWrapper.run(ServerSchedulerReportingWrapper.java:22)
[22:06:17 WARN]:        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[22:06:17 WARN]:        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[22:06:17 WARN]:        at java.base/java.lang.Thread.run(Thread.java:833)

So it's obviously a concurrency problem writing to the region files (of IA), I'm not sure how you are testing the loads that have been posted on Git but I assume that a single threaded test class has been used. Can you try that again on a multi threaded load simulation via PaperMC, it should directly give the same exceptions, I got them every x seconds on my generation tests.

I do not know your source but an ConcurrentModificationException in combination with an Hashmap happens often if you iterate the Hashmap in one thread while you alter the Hashmap items (delete, add, etc.) in another or if you remove/alter items in a regular "for" loop instead of using an Iterator or if you use ".foreach" what might get progressed async internally – depending on your test setup you will not have issues if you are not utilizing multiple threads (that really run parallel).

LoneDev6 commented 1 year ago

Strange since I tested this build also on Paper without issues, I'll test again. I know what a ConcurrentModificationException is, don't worry, that's why I implemented the code to be thread safe, surely it's a little mistake somewhere in the code, I'll check it.

Anyway (just in case) you're not supposed to play while chunky is generating the world, you should add a whitelist during the generation.

LoneDev6 commented 1 year ago

Note: all tests are outdated