Open jukefr opened 3 years ago
heres another log with a mixintrace
jesus... use pastebin
so just to check that its not actually the disk write speed throttling the operation somehow i ran a quick benchmark
dd if=/dev/random of=/tmp/output bs=8k count=1000k; rm -f /tmp/output
1024000+0 records in
1024000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 32.7489 s, 256 MB/s
I used iotop
in parallel and I have the exact same reading on Total DISK WRITE
so I trust that value to check the server process write speed.
Now I ran the chunky task with only chunky and tictacs, getting a CPS around 65ish
Total DISK WRITE 16.57 M/s
The write speed varies from 10 to 20 which is perfectly fine.
When I run with tictacs and starlight, getting around 130 CPS, surprisingly the Total DISK WRITE is far inferior to the one without starlight even if the CPS is far superior...
Total DISK WRITE 4.82 M/s
It varies from around 2 to 8 with peaks around 15 sometimes but mostly between 2-8...
It could also be that the one without starlight varies from very low to 10-20 fast and I dont see the transition while the one with starlight is more stable overall I guess ?
So the problem might come from Starlight but I don't see why it would do anything to chunk saving i thought it was only a lighting engine rewrite but i guess i will have to take a closer look at the code...
EDIT: as soon as I pause the chunky task the write speed jumps up
Total DISK WRITE 30.16 M/s
So I'm guessing that the problem is in fact that for some reason the chunk saving is not able to keep up with the chunk generation. And I guess that over time if the chunks that need to be saved become larger than the free memory available that would cause a crash and the saving logic is also probably not designed to handle 20G of data at a time if you have the memory to spare (like I do).
EDIT: the same problem also seems to happen with only chunky + stalight actually so i think its a bug with vanilla chunk saving which might not be in the scope of this rewrite
EDIT2: I also experience the exact same behavior with starlight_0.0.2
EDIT3: I also tried launching the server with only 8G of ram and no extra flags of any kind, same outcome.
EDIT4: same result with openjre 8 (previously using openjre11)
EDIT5: same result with oracle 8 jre
EDIT6: sync-chunk-writes
seems to have no impact on the issue
The more I look into this the more it looks like a problem with the vanilla chunk saving module not being able to keep up with chunks being generated at over around 70cps, at this point it looks like it just stores it all in memory until the tasks end and at this point will write the chunks.
I have disabled the max tick time by setting it to -1 to try and see if the chunk saving can manage to stay alive long enough to write however many gigabytes of data is needed to be writter, and kept sync writes to off as from my understanding on a unix platform it should be fine and prefered.
So if my assumptions are correct and that this isnt a result of some mix of weird mixins then its a bug in the vanilla chunk saving and probably need a rewrite or at least adjustments to handle this sort of rates which is probably not in the scope of this project so feel free to close this issue if you want but at least its documented somewhere...
Unsure if this is related, but sometimes when I stop my server, it will hang at saving chunks for overworld/nether/end. It's quite annoying to be honest. Also, sometimes chunks won't load at all. Would downgrading back to 0.1.2 help this?
I wrote a quick mixin to hook on the chunk save queue method and the actual io write return result. The result is what was expected. The queue saves just pile up and once the generation task is over they just start actually writing to file.
This was with sync-chunk-writes=true
And this is a log with sync-chunk-writes=false
Also after actually looking a bit into the minecraft code and your repo's code I guess the saving part could fall under the scope of this project and this could be a feat request for a rewrite/optimization of the IO part which directly impacts TACS save routine.
I just replicated this on an almost vanilla server, so I'm pretty much sure that the issue is just vanilla TACS at this point, from what I gathered snooping around the code :
To replicate on a vanilla server :
/tp ~100 ~ ~
in chat) try not to generate chunks faster than the server can handle but keep generating chunks constantly and you should see all the queuing save of chunk pile up in chat with close to no actual writesTo experience an almost identical case (the one @supsm) was mentioning just do step 4 but generate chunks faster than the server can actually handle. Once you stop and the server starts catching up it generate all the chunks in your path and you will see a long list of queue operations with no actual writes, same exact behavior, then once all the chunks have generated you see the queue logs stop and the writes keep going for a while (depending on how long you kept going)
So this explains that this behavior is observed in an almost vanilla environment as well as a heavily modded one that allows for very high CPS rates. This will impact the server no matter if the chunks are being generated faster than it can handle (since it just considers the cost of saving in the future to be nonexistent apparently)
So this is a definitely a bug (or maybe more so an never thought about case) with TACS, so yeah I guess this is a feature request as map pregeneration is an obvious must for performance gains and this also means that a couple users flying and generating chunks could potentially hog up the entire memory of the server over time and probably many other issues that could be fixed by thinking about how this saving routine can be optimized.
In my experience, without tic-tacs sometimes the server will be overloaded when I'm generating new chunks, but with tic-tacs the server will be fine. For this reason I think what you're seeing is actually the chunks not being able to generate quickly enough, as opposed to not being saved quickly enough. On another note, if you're running the server on a slower hard drive that could definitely be possible.
I did benchmarks (see above) and write speed is not the limiting factor. Typical Java IO should be around 2.5 to 3 times slower than native code so I would expect my write speed to max out at 50 M/s and thats being very conservative and I'm nowhere near this kind of usage except when I stop generating new chunks (writes go from ~5M/s to 30-45M/s at that point which is what is to be expected as stated previously) Also I would agree with your theory about generation speed being the issue but I experience the exact same behavior when running mods that bring my CPS to over 200 which makes me think my theory is closer to the actual cause. The vanilla thing was mainly to try and find a reproducible setup with minimal external factors to provide reproducible steps, this helps with speeding up tracking down the actual root cause of issues and fixing them faster.
EDIT: also if you look at the logs the chunk saves are clearly queued for save and afaik they cant be until they are generated
EDIT2: but your idea of the server being "overloaded" is probably what is in fact happening, and behind the scene the server "manager" then chooses to prioritize chunk generation because it sees the cost of putting the chunks to save in memory for the time being as very low because it probably doesnt expect it to ever get to sizes in the gigabytes and the processing to be hogged uninterrupted for hours on end
So I did some more testing to figure out exactly what happens once the amount of chunks in memory gets close to or over the amount of free memory (maybe the server would somehow pause and force write) but sadly it just seems that the server runs out of memory no matter what.
So I suspect this also would impact servers with low memory not even doing any intense generation but simply exploring with multiple users at the same time would eventually also eat up all the ram. Many scenarios where forcing a save-all flush
routine would be beneficial.
AFAIK PaperMC has a max-auto-save-chunks-per-tick
flag which would probably tell us more about how they fix this if they do.
Here's some more logs I got to illustrate
[02:35:20] [Server thread/INFO]: [Kekmium] 5545 chunks queued for save in the last second
[02:35:20] [Server thread/INFO]: [Kekmium] current ticket count is 51
[02:35:20] [IO-Worker-9/INFO]: [Kekmium] 6 chunks written in the last second
[02:35:20] [Server thread/INFO]: [Chunky] Task running for minecraft:overworld. Processed: 10632494 chunks (75.57%), ETA: 6:01:09, Rate: 158.6 cps, Current: 3, 1630
[02:35:21] [Server thread/INFO]: [Kekmium] 4751 chunks queued for save in the last second
[02:35:21] [Server thread/INFO]: [Kekmium] current ticket count is 51
[02:35:21] [Server thread/INFO]: [Chunky] Task running for minecraft:overworld. Processed: 10632682 chunks (75.57%), ETA: 5:49:03, Rate: 164.1 cps, Current: 201, 1630
[02:35:22] [Server thread/INFO]: [Kekmium] 5240 chunks queued for save in the last second
[02:35:22] [Server thread/INFO]: [Kekmium] current ticket count is 51
[02:35:22] [Server thread/INFO]: [Chunky] Task running for minecraft:overworld. Processed: 10632893 chunks (75.57%), ETA: 5:33:41, Rate: 171.7 cps, Current: 402, 1630
[02:35:23] [Server thread/INFO]: [Kekmium] 4402 chunks queued for save in the last second
[02:35:23] [Server thread/INFO]: [Kekmium] current ticket count is 51
[02:35:23] [Server thread/INFO]: [Chunky] Task running for minecraft:overworld. Processed: 10633021 chunks (75.57%), ETA: 5:39:51, Rate: 168.5 cps, Current: 530, 1630
[02:35:24] [IO-Worker-20/INFO]: [Kekmium] 635 chunks written in the last second
[02:35:24] [Server thread/INFO]: [Kekmium] 3194 chunks queued for save in the last second
[02:35:24] [Server thread/INFO]: [Kekmium] current ticket count is 51
[02:35:24] [Server thread/INFO]: [Chunky] Task running for minecraft:overworld. Processed: 10633214 chunks (75.57%), ETA: 5:34:27, Rate: 171.3 cps, Current: 723, 1630
[02:35:25] [Server thread/INFO]: [Kekmium] 6081 chunks queued for save in the last second
[02:35:25] [Server thread/INFO]: [Kekmium] current ticket count is 46
[02:35:25] [IO-Worker-19/INFO]: [Kekmium] 108 chunks written in the last second
[02:35:26] [Server thread/INFO]: [Chunky] Task running for minecraft:overworld. Processed: 10633375 chunks (75.57%), ETA: 5:46:28, Rate: 165.3 cps, Current: 890, 1630
[02:35:26] [Server thread/INFO]: [Kekmium] 3256 chunks queued for save in the last second
[02:35:26] [Server thread/INFO]: [Kekmium] current ticket count is 51
ch[02:35:27] [IO-Worker-8/INFO]: [Kekmium] 210 chunks written in the last second
[02:35:27] [Server thread/INFO]: [Chunky] Task running for minecraft:overworld. Processed: 10633521 chunks (75.58%), ETA: 5:43:55, Rate: 166.5 cps, Current: 1040, 1630
u[02:35:27] [Server thread/INFO]: [Kekmium] 4453 chunks queued for save in the last second
[02:35:27] [Server thread/INFO]: [Kekmium] current ticket count is 51
nk[02:35:28] [Server thread/INFO]: [Chunky] Task running for minecraft:overworld. Processed: 10633731 chunks (75.58%), ETA: 5:29:51, Rate: 173.6 cps, Current: 1240, 1630
y [02:35:28] [Server thread/INFO]: [Kekmium] 5327 chunks queued for save in the last second
[02:35:28] [Server thread/INFO]: [Kekmium] current ticket count is 51
[02:35:28] [IO-Worker-11/INFO]: [Kekmium] 4 chunks written in the last second
pause
[02:35:29] [Server thread/INFO]: [Chunky] Task running for minecraft:overworld. Processed: 10633924 chunks (75.58%), ETA: 5:27:33, Rate: 174.8 cps, Current: 1432, 1630
[02:35:29] [Server thread/INFO]: [Chunky] Task paused for minecraft:overworld.
[02:35:29] [Chunky-minecraft:overworld Thread/INFO]: [Chunky] Task stopped for minecraft:overworld.
[02:35:29] [Server thread/INFO]: [Kekmium] 5338 chunks queued for save in the last second
[02:35:29] [Server thread/INFO]: [Kekmium] current ticket count is 15
[02:35:29] [IO-Worker-10/INFO]: [Kekmium] 619 chunks written in the last second
[02:35:30] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:30] [IO-Worker-11/INFO]: [Kekmium] 3705 chunks written in the last second
[02:35:31] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:31] [IO-Worker-11/INFO]: [Kekmium] 3468 chunks written in the last second
[02:35:32] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:32] [IO-Worker-11/INFO]: [Kekmium] 3146 chunks written in the last second
[02:35:33] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:33] [IO-Worker-10/INFO]: [Kekmium] 3349 chunks written in the last second
[02:35:34] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:34] [IO-Worker-10/INFO]: [Kekmium] 3338 chunks written in the last second
[02:35:35] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:35] [IO-Worker-11/INFO]: [Kekmium] 3241 chunks written in the last second
[02:35:36] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:36] [IO-Worker-11/INFO]: [Kekmium] 3716 chunks written in the last second
[02:35:37] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:37] [IO-Worker-11/INFO]: [Kekmium] 3628 chunks written in the last second
[02:35:38] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:38] [IO-Worker-10/INFO]: [Kekmium] 3547 chunks written in the last second
[02:35:39] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:39] [IO-Worker-10/INFO]: [Kekmium] 3032 chunks written in the last second
[02:35:40] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:40] [IO-Worker-10/INFO]: [Kekmium] 3594 chunks written in the last second
[02:35:41] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:41] [IO-Worker-11/INFO]: [Kekmium] 3723 chunks written in the last second
[02:35:42] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:42] [IO-Worker-10/INFO]: [Kekmium] 3299 chunks written in the last second
[02:35:43] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:44] [Server thread/INFO]: [Kekmium] current ticket count is 1
[02:35:45] [Server thread/INFO]: [Kekmium] current ticket count is 1
Since you know how to write mods, would it be possible to just write random data to test java's write speed?
Hello,
I have been using chunky pregenerator to generate my map. Everything seems to be going smoothly at first and the speeds are quite impressive.
But I suspect that the autosave feature of the server is not able to keep up with the chunk generation rate. (maybe i am completely wrong) When I generate a couple thousand chunks when I try to
stop
my server it will hang on the Saving chunks operation. The cpu and io seem to both be at 0 so I have to manually kill the process so I don't get a crash log or anything.However I tried doing the same thing and manually
save-all flush
before stopping the server and at that point I got a crash report.This seems to only happen when tictacs is enabled, so I have very high suspicion its something to do with tictacs but I'm sorry if its not. To be honest my java skills are close to nonexistent so I can't really debug this properly and give you more in depth traces and whatnot.
Crash report logs