MrCrayfish / MrCrayfishFurnitureMod-Refurbished

The next generation of MrCrayfish's Furniture Mod, written from the ground up with brand new models.
https://www.curseforge.com/minecraft/mc-mods/refurbished-furniture
MIT License
33 stars 11 forks source link

Server hangs on closing #30

Open Jeffrey2911 opened 2 months ago

Jeffrey2911 commented 2 months ago

When using the stop command to close the server, the server hangs on Sending config unload event for refurbished_furniture.server.toml

full log --> https://mclo.gs/xiTM8W0

MrCrayfish commented 2 months ago

Can you try without configapi installed?

Sevelar commented 2 months ago

I'm having a similar issue on my two servers with furniture mod installed. Minecraft version 1.20.1, Fabric version 0.16.3 and mod version 1.0.6. No configapi installed.

[10:39:44] [Server thread/INFO]: Unloading server configs...
[10:39:44] [Server thread/INFO]: Sending config unload event for refurbished_furniture.server.toml
[10:39:44] [pool-2-thread-1/INFO]: Reloaded: 'yawp-common.toml'
[10:39:44] [pool-2-thread-1/INFO]: Reloaded: 'yawp-flags.toml'
[10:39:44] [pool-2-thread-1/INFO]: Reloaded: 'yawp-region-defaults.toml'

I doubt it's yawp and suspect refurbished_furniture.server.toml.

Sometimes the server is hanging for just a minute, another time my server was stuck on unload event for the rest of the night until I eventually killed it. This behavior makes restarting the server difficult and risky.

MrCrayfish commented 2 months ago

I doubt it's anything to do with the unload event since that doesn't block any game threads or withhold resources. At the moment, I have no method to reproduce the issue so this makes this quite difficult. I could offer a custom version of Framework that includes additional logging.

Sevelar commented 2 months ago

If it helps finding the cause of it, I'm willing to try.

MrCrayfish commented 2 months ago

Okay, I've created a custom version with a potential fix and extra logging. It can be downloaded from here: https://github.com/MrCrayfish/Framework/actions/runs/10764776778

You can see all the changes I made here. https://github.com/MrCrayfish/Framework/compare/multiloader/1.20.X...multiloader/1.20.X-debug

Sevelar commented 2 months ago

Checked the debug build you've provided.

Following the logs- it seems it's most likely not Framework's fault, since provided logs detected all logs to be unloaded successfully on my side:

[11:10:10] [Server thread/INFO]: Unloading server configs...
[11:10:10] [Server thread/INFO]: Unwatching config: ./config/refurbished_furniture.server.toml
[11:10:10] [Server thread/INFO]: Closing config: ./config/refurbished_furniture.server.toml
[11:10:10] [Server thread/INFO]: Sending config unload event for refurbished_furniture.server.toml
[11:10:10] [Server thread/INFO]: Checking if all Framework configs unloaded...
[11:10:10] [Server thread/INFO]: Is 'refurbished_furniture.server.toml' unloaded: true
[11:10:10] [Server thread/INFO]: Is 'refurbished_furniture.client.toml' unloaded: true
[11:10:11] [pool-2-thread-1/INFO]: Reloaded: 'yawp-common.toml'
[11:10:11] [pool-2-thread-1/INFO]: Reloaded: 'yawp-flags.toml'
[11:10:11] [pool-2-thread-1/INFO]: Reloaded: 'yawp-region-defaults.toml'

I'll do my own testing with YAWP disabled, but for now I have no reason to believe this behavior is caused by Framework.

EDIT: Even without YAWP, the server still hangs up on the shutdown, I don't know anymore...

EDIT 2: Disabled all YAWP, Framework and Refurbished Furnitures and the issue still occured. It must lay somewhere deeper in my case then, but I do not suspect Framework anymore 🙂

MrCrayfish commented 2 months ago

Yeah, the clearest sign is the line [11:10:10] [Server thread/INFO]: Closing config: ./config/refurbished_furniture.server.toml as this is when the resource is freed. So if the server was waiting for a resource to be released, it would have been that.

I believe Microsoft has a tool that you can use to see the files being used by a process. https://learn.microsoft.com/en-us/sysinternals/downloads/process-explorer

You could also try dumping the Java threads, and this should show which ones are running/waiting. https://www.baeldung.com/java-thread-dump

olicooper commented 3 weeks ago

I am seeing this bug in version 1.21.1 running on Ubuntu.. I noticed your patch is on the 1.20.X branch, will this be patched on for 1.21.1 too? Is there anything we can do to fix it before the patch is released?

MrCrayfish commented 3 weeks ago

@olicooper As shown in the discussion already, I do not know the exact cause and have only made a guess. The patch that I included recently was to potentially fix the issue, but I wasn't able to confirm if it fixed the issue since I haven't been able to reproduce it yet. However @Sevelar's second edit indicates they encountered the problem without my mods installed.

Can you provide more details about your game? Are you running a server, what OS and version, mods versions, etc. Logs too will help. (Use https://mclo.gs)

JIAFALSEDREAM commented 3 weeks ago

Same issue on my server (forge47.3.7, MC version 1.20.1, mod version 1.0.6) image The server will get stuck while shutting down

MrCrayfish commented 3 weeks ago

@JIAFALSEDREAM What OS?

JIAFALSEDREAM commented 3 weeks ago

@JIAFALSEDREAM What OS?

winserver2016

olicooper commented 3 weeks ago

OS is Ubuntu 24.04.1 running headless java (openjdk-21-jdk-headless) and systemd to start the minecraft server with /bin/sh -c "./run.sh --nogui"... Initially set the server up with unmodded MC which worked perfectly, then I noticed this when shutting the process down. I issue the /stop command to shut MC down.

$java -version

openjdk version "21.0.4" 2024-07-16
OpenJDK Runtime Environment (build 21.0.4+7-Ubuntu-1ubuntu224.04)
OpenJDK 64-Bit Server VM (build 21.0.4+7-Ubuntu-1ubuntu224.04, mixed mode, sharing)

run.sh

java -jar forge-1.21.1-52.0.15-shim.jar --onlyCheckJava || exit 1
java @user_jvm_args.txt @libraries/net/minecraftforge/forge/1.21.1-52.0.15/unix_args.txt "$@"

user_jvm_args.txt

-Xms1.5G
-Xmx2G
-XX:+UnlockExperimentalVMOptions
-XX:+UseZGC

latest.log

[cpw.mods.modlauncher.Launcher/MODLAUNCHER]: ModLauncher running: args [--launchTarget, forge_server, --nogui]
[cpw.mods.modlauncher.Launcher/MODLAUNCHER]: JVM identified as Ubuntu OpenJDK 64-Bit Server VM 21.0.4+7-Ubuntu-1ubuntu224.04
[cpw.mods.modlauncher.Launcher/MODLAUNCHER]: ModLauncher 10.2.1 starting: java version 21.0.4 by Ubuntu; OS Linux arch amd64 version 6.8.12-2
[net.minecraftforge.fml.loading.ImmediateWindowHandler/]: ImmediateWindowProvider not loading because launch target is forge_server
[mixin/]: SpongePowered MIXIN Subsystem Version=0.8.7 Source=jar:file:///[redacted]/libraries/org/spongepowered/mixin/0.8.7/mixin-0.8.7.jar!/ Service=ModLauncher Env=SERVER
[net.minecraftforge.common.ForgeMod/FORGEMOD]: Forge mod loading, version 52.0.15, for MC 1.21.1 with MCP 20240808.132146
[net.minecraftforge.common.MinecraftForge/FORGE]: MinecraftForge v52.0.15 Initialized
...
[net.minecraft.server.MinecraftServer/]: Stopping the server
[net.minecraft.server.MinecraftServer/]: Stopping server
[net.minecraft.server.MinecraftServer/]: Saving players
[net.minecraft.server.MinecraftServer/]: Saving worlds
[net.minecraft.server.MinecraftServer/]: Saving chunks for level 'ServerLevel[world]'/minecraft:overworld
[net.minecraft.server.MinecraftServer/]: Saving chunks for level 'ServerLevel[world]'/minecraft:the_end
[net.minecraft.server.MinecraftServer/]: Saving chunks for level 'ServerLevel[world]'/minecraft:the_nether
[net.minecraft.server.MinecraftServer/]: ThreadedAnvilChunkStorage (world): All chunks are saved
[net.minecraft.server.MinecraftServer/]: ThreadedAnvilChunkStorage (DIM1): All chunks are saved
[net.minecraft.server.MinecraftServer/]: ThreadedAnvilChunkStorage (DIM-1): All chunks are saved
[net.minecraft.server.MinecraftServer/]: ThreadedAnvilChunkStorage: All dimensions are saved
[Framework/]: Unloading server configs...
[Framework/]: Sending config unload event for refurbished_furniture.server.toml

$ls mods/

'Storage Drawers-forge-1.21-13.5.0.jar'   Xaeros_Minimap_24.4.0_Forge_1.21.jar       framework-forge-1.21.1-0.9.2.jar   refurbished_furniture-forge-1.21.1-1.0.6.jar   travelersbackpack-forge-1.21-10.0.1.jar
 StorageDrawers-1.21-Forge-13.2.1.jar     balm-forge-1.21.1-21.0.19-all.jar          grindenc-forge-1.21.x-v2.2.jar     smallships-forge-1.21.1-2.0.0-b1.5.jar         waystones-forge-1.21.1-21.1.3.jar
 XaerosWorldMap_1.39.0_Forge_1.21.jar    'dungeons-and-taverns-v4.4.4 [Forge].jar'   jei-1.21.1-forge-19.8.4.113.jar    trashslot-forge-1.21.1-21.1.1.jar              waystones-forge-1.21.1-21.1.4.jar
MrCrayfish commented 3 weeks ago

Thanks for the detailed information @olicooper.

I have successfully managed to reproduce the issue on Windows, however it's not consistent. Sometimes it happens, sometimes it doesn't. This leads me to believe there is a race condition. The threads that watch config files for changes (hot reloading) are not being shutdown correctly. While this issue has been reported to be caused by my mod, I have been able to produce the same issue on Forge with 0 mods installed.

I can confirm the issue happens on Forge, however I am unable to get it to occur on Fabric/NeoForge. @Sevelar Do you have any new revelations about your problem since you experienced it on Fabric without Framework/Furniture mod? Any chance you could also provide a list of all the mods you had installed?

SirFurtado commented 3 weeks ago

i am also getting the same problem with the server getting stuck, i'm hosting an ATM10 server on the version 1.7 of the modpack, the problem seems to be also occurring with NeoForge. image

MrCrayfish commented 3 weeks ago

@felipextz Thanks for the info. I have just confirmed the problem is replicable on NeoForge with 0 mods installed. Issue seems to be deeper.

MrCrayfish commented 3 weeks ago

So the common denominator in all this has been a library called NightConfig. Forge/NeoForge use this, and Framework uses this (only on Fabric). The issue is a non-daemon thread not terminating on shutdown. Sometimes it terminates, sometimes it doesn't. Seems to be a race condition on Forge. NeoForge I was able to replicate 100%. I am now able to look into a fix for Framework, and I will be submitting a report to Forge/NeoForge.

MrCrayfish commented 3 weeks ago

@JIAFALSEDREAM I can't seem to produce the issue I found on 1.20.1, can you send your full mod list and debug log? (Use https://mclo.gs to send logs)

JIAFALSEDREAM commented 3 weeks ago

@MrCrayfish https://mclo.gs/xoQqHrn

MrCrayfish commented 3 weeks ago

@JIAFALSEDREAM Any chance you can send the shutdown sequence of the debug log? The sent log was cut off. Also how often does the issue occur?

This may be a little advanced, but can you capture a thread dump? Basically you'll need to encounter the issue again, when that happens, dump the threads. Since you're using Windows, you should be able to press Ctrl + Shift + Pause in the same CMD window that the server is running.

JIAFALSEDREAM commented 3 weeks ago

@JIAFALSEDREAM Any chance you can send the shutdown sequence of the debug log? The sent log was cut off. Also how often does the issue occur?

This may be a little advanced, but can you capture a thread dump? Basically you'll need to encounter the issue again, when that happens, dump the threads. Since you're using Windows, you should be able to press Ctrl + Shift + Pause in the same CMD window that the server is running.

Almost 99% of the time when the server is down (including the end of the shutdown with the ctrl+c or other crashes). Then there's the question about Capture a Thread Dump, hopefully this is the https://mclo.gs/FBQ0H11 you need

JIAFALSEDREAM commented 3 weeks ago

and debug log https://mclo.gs/I516f2V

MrCrayfish commented 3 weeks ago

@JIAFALSEDREAM Thanks for the thread dump. After analyzing it, it seems there is a thread holding it up but is not related to the config watcher thread issue I discovered earlier.

Specifically this thread:

"Timer-1" #110 [148] prio=5 os_prio=0 cpu=0.00ms elapsed=44.66s tid=0x0000024d484a5aa0 nid=148 in Object.wait()  [0x000000ab007ff000]
--
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait0(java.base@20.0.2/Native Method)
- waiting on <0x00000005bc0008b0> (a java.util.TaskQueue)
at java.lang.Object.wait(java.base@20.0.2/Object.java:366)
at java.lang.Object.wait(java.base@20.0.2/Object.java:339)
at java.util.TimerThread.mainLoop(java.base@20.0.2/Timer.java:537)
- locked <0x00000005bc0008b0> (a java.util.TaskQueue)
at java.util.TimerThread.run(java.base@20.0.2/Timer.java:516)

"Timer-1" is a non-daemon type (this means the programmer must manually close it to complete the shutdown. Compared to an actual daemon thread, which will be terminated automatically on shutdown). This can be identified as the word daemon is missing before the text prio=5.

I can only guess that another mod is running some sort of timing, like running code at a fixed interval. Minecraft does use Timers in it's own game for showing regional compliances (like South Korea, they show a notification toast in game for how many hours you have played in your current session) but they are safely shutdown. I will try and go through you mod list and find suspecting mods.

MrCrayfish commented 3 weeks ago

@JIAFALSEDREAM Try removing MemorySweep and test again. I found a similar issue where the server would not close. You'll need to translate it. https://github.com/Tki-sor/MemorySweep/issues/4

MrCrayfish commented 3 weeks ago

Just performed a test myself with MemorySweep and it hung the server on close every time. As soon as I removed the mod and ran stop, server closed fine.

JIAFALSEDREAM commented 3 weeks ago

Just performed a test myself with MemorySweep and it hung the server on close every time. As soon as I removed the mod and ran stop, server closed fine.

Due to time constraints, I will try the day after tomorrow. Thank you for your help

JIAFALSEDREAM commented 3 weeks ago

Just performed a test myself with MemorySweep and it hung the server on close every time. As soon as I removed the mod and ran stop, server closed fine.

Done! no problem, thank you again for your help😃, and do I need to close this issue?

MrCrayfish commented 2 weeks ago

Done! no problem, thank you again for your help😃, and do I need to close this issue?

Unfortunately the issue is not completely resolved yet. Your issue was different to the one regarding the NightConfig library.

JIAFALSEDREAM commented 2 weeks ago

It doesn't seem to be completely resolved, and recently this problem has resurfaced. the thread dump: https://mclo.gs/iUTQ3AF

MrCrayfish commented 2 weeks ago

It doesn't seem to be completely resolved, and recently this problem has resurfaced. the thread dump: https://mclo.gs/iUTQ3AF

According to the logs, it's waiting to dispose the server GUI window. Try running your server with --nogui

JIAFALSEDREAM commented 2 weeks ago

It doesn't seem to be completely resolved, and recently this problem has resurfaced. the thread dump: https://mclo.gs/iUTQ3AF

According to the logs, it's waiting to dispose the server GUI window. Try running your server with --nogui

https://mclo.gs/zALzAlm

MrCrayfish commented 2 weeks ago

Not sure what the context of that dump is. Looks like the server failed to start. Did you add the argument correctly?

JIAFALSEDREAM commented 2 weeks ago

Not sure what the context of that dump is. Looks like the server failed to start. Did you add the argument correctly?

My server can start up normally, and I am currently migrating my server to a new service provider and will be using Ubuntu system

MrCrayfish commented 2 weeks ago

My server can start up normally, and I am currently migrating my server to a new service provider and will be using Ubuntu system

What is the context of the thread dump? Because it's showing significantly less threads than before.