Luke100000 / minecraft-comes-alive

Replaces Minecraft's villagers with humans, who can be interacted with.
GNU General Public License v3.0
70 stars 49 forks source link

Timeouts related to Reaper/SpawnQueue/Inn #842

Open Mikatastrof opened 5 months ago

Mikatastrof commented 5 months ago

Describe the bug Crash(s) during map generation with Chunky.

To Reproduce Steps to reproduce the behavior:

  1. Dedicated server, no player, generate fresh map with Chunky
  2. /chunky radius 1000
  3. /chunky start
  4. Wait until one village start to be generated
  5. Crash log

Issues not here when the mod is removed, all chunks around seems well generated.

Crash during entity spawning

In the two case I got something similar to [CompletableFutureDelayScheduler/WARN]: Chunk read at pos 94489280518 took too long (> 1min) in the latest log file.

Version Minecraft Comes Alive 7.5.11, Minecraft 1.20.1, Fabric

Latest.log

crash-2024-01-13_17.43.05-server(entities).txt crash-2024-01-13_15.16.43-server(reaper).txt 2024-01-13-6.log

Thank you for your help

Luke100000 commented 5 months ago

Help needed here, from my perspective I've done everything correctly to prevent a chunk load, yet all those errors happen because I trigger a deadlock due to a chunk load.

pop4959 commented 4 months ago

@Luke100000 Hey - chunky author here to comment a bit on this and possibly lend out some advice. Have seen this crash a couple times in the last month in our support channels. Quite a popular mod pack!

It looks like the crash is primarily caused by MCAFabric trying to spawn entities on the server tick, rather than as part of world gen, through this "SpawnQueue" system. Or at least that's what I can glean from the stack trace.

This spawn entity call appears to call a few methods down the trace, and eventually calling getChunkBlocking which blocks the server thread until the chunk is loaded. This itself should already be considered a bug, as you never want to do a blocking chunk load in situations like this.

Indirectly, this can probably be caused by a pre-generator such as chunky due to the fact that the chunks only stay loaded long enough to generate - after that they may unload at any moment as they are not kept around much longer. If the chunk is already unloaded by the time MCAFabric wants to spawn entities from the queue, it will require re-loading the chunk (and again this code path ends up blocking the main server thread to make those ends meet).

That said, this can also be caused in any other situation where a player is loading and generating chunks, although it's probably less likely since this happens far less frequently and much slower.

As a potentially easy fix, I would advise placing tickets on these chunks until you have finished processing entities from the queue for said chunk. This will prevent it from unloading until you are done with it, avoiding needing to re-load it and causing it to block the main thread here.

Alternatively, explicitly perform an asynchronous load of the chunk when you are ready to place the entities there, and let it unload as soon as you're done with that.

Or just do it during world gen instead of this queue thing (probably the best solution but I imagine there is a reason this queue exists... surely).

I would also experiment with removing c2me since this is an experimental mod and could cause issues, I see it mixes in right above the blocking chunk get call. Might or might not help, but also IMO not a proper solution if so vs just doing either of the above 2 things I mentioned.

I hope this helps! Feel free to contact me @pop4959 on Discord if you want to chat more about this.

pop4959 commented 4 months ago

@ishland Maybe you can also comment on the c2me bit cause it looks like you re-write that blocking bit that actually seems to deadlock here.

ishland commented 4 months ago

Maybe you can also comment on the c2me bit cause it looks like you re-write that blocking bit that actually seems to deadlock here.

The issue should be fixed in pop4959/Chunky#322 and issues like this is one of the motivation of that PR.

The explanation: The IO worker in vanilla fires CompletableFuture callbacks on the IO thread. And the c2me one fires the callback on its worker thread. The problem is that Chunky blocks the thread in that callback, and the worker threads also handles world generation and other stuff. So this is effectively locking up the entire system since world generation tasks have no place to execute on.

Luke100000 commented 4 months ago

@Luke100000 Hey - chunky author here to comment a bit on this and possibly lend out some advice. Have seen this crash a couple times in the last month in our support channels. Quite a popular mod pack!

It looks like the crash is primarily caused by MCAFabric trying to spawn entities on the server tick, rather than as part of world gen, through this "SpawnQueue" system. Or at least that's what I can glean from the stack trace.

This spawn entity call appears to call a few methods down the trace, and eventually calling getChunkBlocking which blocks the server thread until the chunk is loaded. This itself should already be considered a bug, as you never want to do a blocking chunk load in situations like this.

Indirectly, this can probably be caused by a pre-generator such as chunky due to the fact that the chunks only stay loaded long enough to generate - after that they may unload at any moment as they are not kept around much longer. If the chunk is already unloaded by the time MCAFabric wants to spawn entities from the queue, it will require re-loading the chunk (and again this code path ends up blocking the main server thread to make those ends meet).

That said, this can also be caused in any other situation where a player is loading and generating chunks, although it's probably less likely since this happens far less frequently and much slower.

As a potentially easy fix, I would advise placing tickets on these chunks until you have finished processing entities from the queue for said chunk. This will prevent it from unloading until you are done with it, avoiding needing to re-load it and causing it to block the main thread here.

Alternatively, explicitly perform an asynchronous load of the chunk when you are ready to place the entities there, and let it unload as soon as you're done with that.

Or just do it during world gen instead of this queue thing (probably the best solution but I imagine there is a reason this queue exists... surely).

I would also experiment with removing c2me since this is an experimental mod and could cause issues, I see it mixes in right above the blocking chunk get call. Might or might not help, but also IMO not a proper solution if so vs just doing either of the above 2 things I mentioned.

I hope this helps! Feel free to contact me @pop4959 on Discord if you want to chat more about this.

Maybe you can also comment on the c2me bit cause it looks like you re-write that blocking bit that actually seems to deadlock here.

The issue should be fixed in pop4959/Chunky#322 and issues like this is one of the motivation of that PR.

The explanation: The IO worker in vanilla fires CompletableFuture callbacks on the IO thread. And the c2me one fires the callback on its worker thread. The problem is that Chunky blocks the thread in that callback, and the worker threads also handles world generation and other stuff. So this is effectively locking up the entire system since world generation tasks have no place to execute on.

Thank you both for the detailed explanation! The SpawnQueue forcing chunk loads is definitely an issue, I would expect isChunkLoaded to only retrieve the status, not trigger a load. May also explain similar issue I currently have.

The spawnqueue itself may be a relict from the past where spawning on world gen caused a whole set of issue I can no longer remember. Getting rid of it completely would indeed be the best solution, if it works.

pop4959 commented 4 months ago

Didn't realize the recent fix @ishland pushed was directly related to this - nice, makes sense now in hindsight. Thank you for catching that and making the PR. I'll suggest that people try the latest dev build and considering pushing another release soon (was planning for the next being 1.20.5 but I guess another one sooner can't hurt).

Regardless, it's probably also a good idea to avoid blocking from this mod as well, both from a general performance perspective as well as avoiding other potential issues around chunk loading/unloading. Sounds like that's still on the table so that's great.

EliasRipley commented 4 months ago

Redirected here from MCA Discord. Our server is not using c2me, yet we do run Better Chunk Loading. (Forge): https://www.curseforge.com/minecraft/mc-mods/better-chunk-loading-forge-fabric

Here's a crash report. crash-2024-03-01_13.31.50-server(1).txt

EliasRipley commented 4 months ago

Sorry, this is probably the one that's more relevant, same author / logo. https://www.curseforge.com/minecraft/mc-mods/smooth-chunk-save