Open dghodgson opened 12 years ago
At any rate, does rsync itself handle checksum stuff? I mean, it's got a stable codebase so I don't think we have to worry about data integrity problems too much, but it's nice to have in case your system components start failing without your knowledge. Thought that's being a little paranoid I suppose.
Also, should we stick with rsync, or is there a better alternative out there?
I have not heard of one. I suppose a wuick g-search would prove otherwise, but from what I can tell, it seems to be the oly one widely used.
I went through the manpage looking for flags to use, and I think I found the ones most appropriate, but testing is needed.
Hmm. I looked into zsync, but that's geared towards file distribution over a network. I think at this point we just need to get something working. And besides, like you said, it's widely used. Can't go wrong with standards-compliant. :B
I think we should use rdiff-backup instead of rsync. It uses the rsync algorithm, and functions much the same, but it stores the file deltas as well so you can roll back changes. Pretty important for situations where your map has been griefed.
Our current backup system would have you just grab the most recent backup. If you only used rsync and didn't do any regular backups, you'd be out of luck. With rdiff-backup, you could just roll back the world files to a point in time before the griefing occurred.
We could also just use rsync to sync changes to the disk, and then do regular backups on a daily basis.
Though if we used rdiff-backup, we could sync the changes to disk and make snapshots all with the same program. Just delete the rdiff-backup-data folder to get rid of the delta files (which would be unnecessary for snapshots).
I think we could actually implement quite a number or backup and archival features using rdiff-backup and basic system commands alone. What do you guys think?
Like you say the optino is there, we should stick to one and make sure it is working the way we want before adding another option though.
I agree. The point I'm getting at though, is that although we already have a working rsync function, and we could still do snapshots with it, rdiff-backup gives us more options.
For the type of service minecraft is, and the kind of backups we'll be doing, rdiff-backup seems like the better choice. It also opens up more doors for other features down the road since it gives you more options to work with (i.e. being able to get a copy of a world folder at any point in time between the first backup and the current backup).
So should we dump rsync, implement rdiff-backup, and rework the backup commands to use rdiff-backup for snapshots (which would also enable us to dump our checksum function), or just continue with rsync?
Sounds like rdiff is a much better option. I have some time later today, should be able to throw something together.
Alright, I am going to ask the the feature branch be up-to-date with the develop branch, the scriptname variable has obviously been fixed in the latest version.
The question is, where are we syncing to? what directory?
rdiff-backup to the RD_WORLDS_DIR spits an error saying that it will mess it up if it uses this one.
RD_WORLDS_DIR has been deprecated. I brought the rsync branch up to date with develop on my end, but I forgot to push the changes.
The rdiff-backup work should be done in a new feature branch though. I want to keep feature-rsync around for later experimentation.
Also, if we run into issues with rdiff-backup hogging I/O or CPU time, we can use ionice and cpulimit. Though they need to be installed, since I don't think most distributions come with those packages by default.
It seems I can't rollback to your last commit on that file. I will have to try later. Now to make that other branch set up properly.
I've got a branch I can push. No work done on it yet though.
If you want to reset your branch back to a specific point, just use git reset --hard HEAD~n
where "n" is the number of commits back you want to go. You can also specify a SHA checksum instead of "HEAD~n" to specify a specific commit.
You could also make a branch based off of the current head of feature-rsync, and then do the git reset on feature-rsync to effectively move all your commits from the old branch to the new branch you've created.
Though I'd like to have feature-rdiff-backup based off of the latest commit on develop to help keep things from getting messy. I suppose if you did he reset as I mentioned, and then did a rebase on the new branch, that would probably take care of it. I've never done a rebase myself though.
Though keep in mind that the --hard option will erase your commits from that branch, so if you don't make a new branch first, your work will be lost.
I won't be available until probably 9 or 10 PM PST, but if you're still up, go ahead and start a hangout on G+ so we can discuss exactly how to go about implementing rdiff-backup. We could also do Thursday or Friday if you're busy.
Alright, so I am working on a restore function and I am currently wondering just where the rdiff-backup should restore to.
The process so far is: Set REF similar to other functions Check to make sure that $2 is set when called, if not, explain what accepted values are. Check to make sure the server is not running. Setup timeframe ($2)f variable to pass to the function backup_incr_restore
The function: (for development, it is just spitting out variable strings) Check for a directory inside the restore target Ask user if they want to overwrite what is present? Run rdiff-backup -v 6 -r $timeframe $(var where restoring from) $(var where we are restoring to)
Looks good to me. Push the code to the feature branch and I'll take a look at it.
Minor changes, but the script is acting the way I expect it to when being called.
We've already got some work done on this in the feature-rsync branch (thanks goes to Will for that), but it needs to get finished asap. A number of people using this script on servers with large world files are experiencing server timeout issues when running backups. I'm convinced it's due to I/O saturation when the backup function is run since it just dumps absolutely everything and wastes system bandwidth and space.