MommyHeather / AdvancedBackups

BSD 3-Clause "New" or "Revised" License
26 stars 4 forks source link

Backup purging hangs when using uncompressed differentials / incrementals #63

Closed Jadan1213 closed 4 months ago

Jadan1213 commented 5 months ago

I've noticed a few times that i have to manually re-enable chunk saving after a backup. i have no other mods that manipulate chunk saving installed.

I'm on forge 47.2.17, MC 1.20.1, mod ver. 3.4. I have Minecraft running as a local server on another machine, and my wife and I connect to it from our client machines.

I'm not 100% sure if it only happens when a save is done out of game (disconnected), but i feel like it is. I've noticed usually after logging back in after being off the server a while (no players) that if a backup was made, i have to manually re-enable saving with /save-on. This doesn't seem to happen when actually connected to the server, but i'm going to try and keep an eye out and see.

Jadan1213 commented 5 months ago

the backup completed successfully. image

Jadan1213 commented 5 months ago

would you like me to test if this issue occurs with the other purge options?

Jadan1213 commented 5 months ago

on the original jars?

MommyHeather commented 5 months ago

I don't see a need - with how similar they are, I'd be very surprised if the hang doesn't occur on all three.

The check would also work on all three, so no need to do any extra checks there.

MommyHeather commented 5 months ago

This will ideally be a temporary fix, I'm not happy with just relying on this check - but it's better than the backup hanging.

Jadan1213 commented 5 months ago

i'm 99% sure that it works properly in a container when the world backup storage location is located on the local vm storage and not through a cifs/smb mount. (like i said, i'm probably a very niche use case)

Jadan1213 commented 5 months ago

When i was running with the backup storage inside the vm/docker container, i never ran into the issue. I only started noticing old backups not being purged AFTER i moved the world backup storage to a cifs bind mount.

MommyHeather commented 5 months ago

Huh. That is interesting...

MommyHeather commented 5 months ago

There's not really a way for me to check whether the backup location is over the network or not.

And come to think of it, it's possible more scenarios use docker without the smb / cifs setup. I'm not too sure where to proceed now, actually...

Jadan1213 commented 5 months ago

i'm willing to bet if i moved the storage back inside the VM (my docker host is a VM on TrueNAS scale), that the backups would complete properly.

Jadan1213 commented 5 months ago

It's not possible to query the system via mount and grep the output to look for the world backup string and see if it's on a cifs mount?

mount | grep <$backupfolder> | grep cifs

something like that?

Jadan1213 commented 5 months ago

i do get output within the docker container when doing that command. not sure if java has any access to that. it's with user permissions, no root.

`minecraft@dc80e5520122:~$ mount | grep /minecraft | grep cifs

//192.168.69.1/Minecraft on /minecraft type cifs (rw,relatime,vers=3.0,cache=strict,username=minecraft,uid=1000,noforceuid,gid=1000,noforcegid,addr=192.168.69.1,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)

minecraft@dc80e5520122:~$`

Jadan1213 commented 5 months ago

this is assuming that the AB config path is set to the same mount name.

for example, my directory is //192.168.69.1/Minecraft

mounted to /minecraft in the container.

Jadan1213 commented 5 months ago

maybe have it check if it's in a container, then if it is, check if the backup directory is local storage or not? if it's local, do the checks, if it's not listed locally then skip checks? (i'm just throwin spaghetti at the wall here lol.)

MommyHeather commented 5 months ago

Hmm. I can have a play tomorrow, I'm just not sure if I can come up with anything reliable enough.

If you enter the full backup path (whatever so exactly what you specified in the config) with that command, does the output change much?

Jadan1213 commented 5 months ago

typing the full path from the config: config.advancedbackups.path=/minecraft/world-backups/aubincraft-1-20-1 wouldn't get a result since the mount command doesn't show file structure, only what server share is mounted and to what directory.

Jadan1213 commented 5 months ago

Or maybe, add a config option for docker containers specifically.

config.advancedbackups.isdockerstoragelocal=true/false

where if someone has their backup storage local, they can set it to true manually, and it will do the checks. or if it's not local but mounted via cifs bind, it can be set to false and skip the checks.

MommyHeather commented 5 months ago

mmm, figures. not really sure what to do here then - whether or not I should allow backup purging when in docker containers, how i could reliably check for a network drive, and ideally how I can fix this issue outright.

A config option that needs to be enabled to allow purging within a docker container definitely seems to be a good temporary solution though.

Jadan1213 commented 5 months ago

and have a log output saying ERROR: Server is running in docker, storage checks will be skipped unless isdockerstoragelocal=true. do NOT set this to true if your backup storage is via a network bind mount as the backup will hang and the world chunk saving could fail to be re-enabled

Jadan1213 commented 5 months ago

that way it's up to us as the user to know what we're doing, and that liability doesn't fall on you in the event a permanent solution to this is impossible.

Jadan1213 commented 5 months ago

don't need people blaming you for their worlds getting messed up if they toggle stuff they don't know on and off lol

Jadan1213 commented 5 months ago

i'm going to enable the other checks with the original jar and see if things hang or not. that way you can know for sure if this issue only affects space based purging or not.

MommyHeather commented 5 months ago

Alright, thank you. I'm sure it'd affect all, but you're welcome to check.

MommyHeather commented 5 months ago

It's worth noting that size is checked, then total count, then dates.

Jadan1213 commented 5 months ago

so i have this right... config.advancedbackups.purge.days=0 is the number of days old before the backup is purged config.advancedbackups.purge.count=0 is the number of total backups to keep

where does this fit in config.advancedbackups.chains.length=50

MommyHeather commented 5 months ago

The chain length controls the maximum amount of backups that can be in a single differential or incremental "chain" - that is, one full backup and 49 partial backups.

In a lot of cases you might not ever see this number hit, due to the "smart reset" option anyway.

It won't matter in the context of this bug.

Jadan1213 commented 5 months ago

ok i'm going to copy off my world backup directory and start playing with the settings. i'll let you know if i figure anything out.

Jadan1213 commented 5 months ago

so far i'm testing the original jar with the purge after # of days setting. The backup completes properly, but i'm noticing it's not purging the older backups, even with days = 1.

moving on to the next setting

Jadan1213 commented 5 months ago

testing with the purge after # of backups does not complete and re-enable saving, nor does it remove old backups.

Jadan1213 commented 5 months ago

Added a secondary VM disk to my docker VM, and bind mounted the local storage directory into the docker container. I set the permissions for read/write and copied over all the world backup data and changed the backup directory to the local vm storage.

so far i ran the first test with purge size 100GB. the backup did not complete properly. I'm starting to wonder if this is an issue with having a backup folder that is outside of the game's data folder. anything other than ./backups. (I mean it could still be because it's in a docker container...)

I'm going to copy the game backups into the ./backups folder and see what happens a little later.

Jadan1213 commented 5 months ago

i did discover if you move/delete the folder location and a backup tries to run, it will output a null error, but even after changing the directory location and doing a reload-config it still thinks a backup is running. (i didn't shut down the server when i did it lol.)

Jadan1213 commented 5 months ago

copied the world backups to the ./backups folder, set the size limit to 50G (limited space in the VM) and ran a backup.

it ran the backup, i haven't gotten any completion message, nor a re-enabling of saving. It is pegging a single cpu core at 100%, but spark profiler isn't showing any activity from advancedbackups. i'm going to let it sit for a while and see what happens.

Jadan1213 commented 5 months ago

well i let it sit, and nothin. so now i'm more confused. i'm wondering if the only reason i didn't notice this at first is because it takes time for space to get used and hit the 100G limit i had set.

MommyHeather commented 5 months ago

Hmm. I know that external backup locations are no problem outside of docker containers, but it's still possible that docker interferes somehow..

As for the spark profiler, it won't show anything with default arguments. Adding --thread * will tell it to profile all threads, as the backup process is threaded to reduce impact on the game loop.

MommyHeather commented 5 months ago

Hmm, I've been trying to recreate this to no avail.

Tested steps so far:

With both backup locations, I observed proper count-based purging. As such, do you think you could get me a download link for your backups folder, provided it isn't too big?

MommyHeather commented 5 months ago

Ah! Managed to reproduce it, only after disabling compression like you did. Interesting...

MommyHeather commented 5 months ago

aaand reproduced outside of docker. With that, I think I now have enough information to test a fix, we shall see.

Full reproduction steps:

I'll run some tests on my end, and come up with a fix for you to try out too if that's okay?

MommyHeather commented 5 months ago

AdvancedBackups-forge-1.20-3.4-hangfix.zip And here's a jar to test. This should solve it.

Jadan1213 commented 5 months ago

i'll download it and give it a try! That's crazy the rabbit hole we went down and how it was related to not using compression!

Jadan1213 commented 5 months ago

it's going to take me a few, gotta undo a lot of changes i made to reset everything to how i originally had it lol.

Jadan1213 commented 5 months ago

i use no compression because it's faster for backup (nvme storage), and my storage location runs on zfs, so it has compression already lol. I bet mostly everyone keeps it default to using compression and that's why this never came up!

Also, i did use the default with compression on at first!, I only disabled it after setting the storage to my smb share!

Jadan1213 commented 5 months ago

I set the limit to 75G and ran a backup with the new jar. It deleted the old backup chain and brought it under limit.

would it be possible to get log messages on purge? like, backup size has exceeded limit of size, deleting old backups. the following backups were deleted; backup1 backup2 backup3

etc? if not, no problem, purging seems to work now =)

MommyHeather commented 5 months ago

I do indeed want to implement purging notifications, though it's lower priority than some other things. I think pushing that as a separate feature request should be a good idea though, rather than this issue.

I'll mark this as fixed in next release, and I'd suggest you continue using the test jar I gave you until v3.5 is out so your purging can work as it should.

Jadan1213 commented 5 months ago

Thank you for your help. Took lots of testing, but i'm glad you figured it out! I'll keep using the jar you sent me until a proper 3.5 release is posted. Thank you again for your time!

Jadan1213 commented 5 months ago

wanted to also let you know, by fixing this issue you have corrected a perplexing issue i was having where the server was not saving/shutting down in a timely manner and was being force stopped, causing loss of some changes. It looks like this issue with the thread hanging was preventing it from working right! so double thanks!

MommyHeather commented 5 months ago

Yeah, those shutdown hangs make sense too - the server can't stop whilst a backup is running.

I'm reopening this issue until 3.5 is pushed out, at which point it'll automatically be closed.

github-actions[bot] commented 4 months ago

This issue has been closed - fixed in 3.5 which has now been pushed to Curseforge and Modrinth.

MommyHeather commented 4 months ago

reopening, CF workflow is erroring..