digint / btrbk

Tool for creating snapshots and remote backups of btrfs subvolumes
https://digint.ch/btrbk/
GNU General Public License v3.0
1.69k stars 122 forks source link

Add symlink to last snapshot on source volume #287

Open ghost opened 5 years ago

ghost commented 5 years ago

As mentioned by @Massimo-B on https://github.com/digint/btrbk/issues/283#issuecomment-509572074 I think it is a good idea if Btrbk would create a symlink to the last snapshot in the source volume. This should of course be configurable as the where the symlink is placed and what it is called.

For example if we have this setup:

/mnt/rootvol/volume/
  root
  home
  var
/mnt/rootvol/snapshots/
  root.20190710T0000
  root.20190710T1000
  root.20190710T2000
  ... and so on

A symlink could be placed in /mnt/rootvol/snapshots/ linke this

/mnt/rootvol/snapshots/
  root.20190710T2000
  root.latest - > root.20190710T2000

Another option

/mnt/rootvol/volume/
  root
  root.latest -> ../snapshots/root.20190710T2000
  home
  var

Third option

/mnt/bootvol/boot
  boot.20190710T0000
  current -> boot.20190710T0000
  previous -> boot.20190709T0000 (for easy change back in grub)

Of course, people have different needs, so this needs to be configurable.

digint commented 5 years ago

Thanks for the suggestion, added "enhancement" label.

I'm not sure if this is worth the effort: this adds redundancy, and implies a new framework for checking these symlink, and keeping them up-to-date might not be trivial (note that the latest transferred backup is not always the latest by date, e.g. on "btrbk resume").

Also, it's very easy to do this in a shell script, using something like:

Of course, people have different needs, so this needs to be configurable.

Which is probably the strongest argument why not implement it in btrbk directly, but rather in helper scripts.

Massimo-B commented 5 years ago

I would highly appreciate a current and previous symlink. Why should keeping this up-to-date not be trivial? If btrbk is the master of the snapshots, it could reset the symlink after every new snapshot.

Trying that btrbk list latest it returns the same count of lines, just with a different set of columns. btrbk list has: source_subvol snapshot_path snapshot_name target_path btrbk list latest has: source_subvol snapshot_subvol status target_subvol

How is the logic behind this? At least I can determine the latest snapshot like this:

$ btrbk list latest /mnt/usb/mobiledata/snapshots/gentoo-mb/home/ --format=raw |sed 's/.*target_path="\([^"]*\)".*/\1/g'
/mnt/usb/mobiledata/snapshots/gentoo-mb/home/home.20190926T071000+0200

But often I need to use paths from command line for vimdiff, rsync or cp for restoring parts of the backup. There I would need to use some "$(getlatestsnapfrom mobiledata desktop-home home)" (modelling the path structure of device, machine, sub that is already there), while just having a current and previous symlink would probably lead to the snapshot I need.

camoz commented 2 years ago

I'm also interested in this feature. I think I have also found an argument for implementing it in btrbk which I have explained at the end (but I'm still unsure if it's a good idea, simply because of the added complexity mentioned by @digint).

First I want to note that determining the latest snapshot is not trivial when taking different timzones and daylight saving time into account. Furthermore, to be able to calculate it from the filenames, it is necessary to have them in long-iso format - but they still won't sort correctly when simply listing them in lexicographical order.

Example: Germany has DST and this year, of the following two timestamps the first will be 50 min earlier (in physical time) than the second one:

20221030T023000+0200
20221030T022000+0100 <- 50 min later in physical time

But when listing them in lexicographical order, the later timestamp (in physical time) comes actually first:

20221030T022000+0100 <- 50 min later in physical time
20221030T023000+0200

Since we are talking about backups, IMO it's not acceptable to have it fail regularly, even if it's just a one-hour window per year...

I guess an alternative could be to try to determine the latest snapshot using some btrfs subvol list -r -s -o --sort=+gen <path> (or so) command, but as far as I can see one would still have to filter out the desired snapshots by pattern matching, since there might be snapshots of several different subvolumes in one snapshot_dir or target...

Argument for implementing "latest snapshot" links in btrbk

For btrbk it would be probably easier to link to the latest snapshot than for a script. Not because it can determine it easier than, say a bash script (I think the logic for that will be roughly the same), but because it's easier for btrbk to update the links: In each run, btrbk knows which subvolumes/targets/etc. it touched, and thus where the links need to be checked and possibly updated. If the link logic would be implemented in a separate script, it would have to check all possible places for changes that are configured in btrbk, despite the fact that btrbk might have updated a subset of them. Also, access to remotes is already configured via btrbk - that would have to be replicated in the script, if it would update the links also on the remotes.

What to do now?

Since it's unlikely to have that implemented in btrbk soon, even if that feature were to be accepted, and since one probably does not need "latest snapshot" links on every target/archive, but maybe even only in the local snapshot_dir, I'd say let's create a separate script for now. If it's solid, maybe it could even be added into contrib/tools.

Massimo-B commented 2 years ago

Hi, I started a script that adds a "latest ->" symlink in all target directories. I run this after each btrbk cronjob for different btrbk.conf files.

Could we add that to contrib/tools?

update_snapshotlinks

```bash #!/usr/bin/env bash this_basename="${0##*/}" alias_latest="latest" config="${1:-/etc/btrbk/btrbk.conf}" usage() { cat < Description: All btrbk target dirs are parsed from configuration file and get a "latest ->" symlink pointing to the latest snapshot. If no configuration file is set, then /etc/btrbk/btrbk.conf is used. Options: -h, --help EOF } latest() { local file latest for file in "${1:-.}"/*; do [[ -h "$file" ]] && continue [[ "$file" -nt $latest ]] && latest=$file done printf '%s\n' "$latest"; } ## Usage: latest [dir] while [[ $1 == -* ]]; do case "$1" in -h|--help) usage; exit 0;; esac done while read -r target; do if [[ ! -d $target ]]; then continue fi latest_path="$(latest $target)" if [[ -n $latest_path ]]; then latest_name="$(basename ${latest_path})" [[ "${target}/${latest_name}" -ef "${target}/${alias_latest}" ]] && continue ln -sfvn "${latest_name}" "${target}/${alias_latest}" fi done < <(grep "target " ${config} | grep -v "^#" | awk '{print $3}' | sort -u) ```
Massimo-B commented 11 months ago

Please adapt the title and replace "last" by "latest".

My current workaround script: update_snapshotlinks

```bash #!/usr/bin/env bash this_basename="${0##*/}" alias_latest="latest" default_configfile="/etc/btrbk/btrbk.conf" usage() { cat < Description: All btrbk snapshot (variable "snapshot_dir") and target dirs (variable "target") are parsed from configuration file in order to add a "latest ->" symlink pointing to the latest snapshot. If no configuration file is set, then /etc/btrbk/btrbk.conf is used. Options: -c, --config configuration file (default: $default_configfile) -h, --help Show this help -d, --debug Debugging mode EOF } latest() { for subdir in "${1:-.}"/*; do [[ -h "${subdir}" ]] && continue subdirs+=("$subdir") done readarray -t subdirs_sorted < <(printf '%s\n' "${subdirs[@]}" |sort) if [[ ${subdirs_sorted[-1]} != *"*" ]]; then printf '%s\n' "${subdirs_sorted[-1]}"; fi } ## Usage: latest [dir] printlist() { for item in $@; do printf "\t%s\n" $item done printf "\n" } configfile="$default_configfile" while [[ $1 == -* ]]; do case "$1" in -c|--config) shift if [[ $1 == -* || -z $1 ]]; then die "No config file given" fi configfile="${1}"; shift;; -d|--debug) opt_debug=1; shift;; -h|--help) usage; exit 0;; -*) echo "invalid option: $1" 1>&2; usage; exit 1;; esac done [[ -n $opt_debug ]] && echo "Config file: $configfile" additional_dirs=$@ [[ -n $opt_debug ]] && echo "Additional dirs:" && printlist ${additional_dirs[@]} volumes=( $(grep -e "^volume\s*/" "${configfile}" | awk '{print $2}') ) snapshot_dirs=( $(grep -e "snapshot_dir " "${configfile}" |grep -v "^#" |awk '{print $2}' |sort -u) ) [[ -n $opt_debug ]] && echo "snapshot dirs:" && printlist ${snapshot_dirs[@]} target_dirs=( $(grep "target " "${configfile}" |grep -v "^#" |awk '{print $3}' |sort -u) ) [[ -n $opt_debug ]] && echo "target dirs:" && printlist ${target_dirs[@]} all_dirs=( ${additional_dirs[@]} ${target_dirs[@]} ) for volume in "${volumes[@]}"; do for snapshot_dir in "${snapshot_dirs[@]}"; do all_dirs+=("${volume%/}/${snapshot_dir%/}") done done [[ -n $opt_debug ]] && echo "all dirs:" && printlist ${all_dirs[@]} readarray -t all_dirs_sorted < <(printf '%s\n' "${all_dirs[@]}" |sort -u) for target in "${all_dirs_sorted[@]}"; do [[ -n $opt_debug ]] && echo "Target: $target" if [[ ! -d $target ]]; then [[ -n $opt_debug ]] && echo "... does not exist, skipped" continue fi latest_path="$(latest $target)" [[ -n $opt_debug ]] && echo "Latest path: $latest_path" if [[ -n $latest_path ]]; then latest_name="$(basename ${latest_path})" if [[ "${target}/${latest_name}" -ef "${target}/${alias_latest}" ]]; then [[ -n $opt_debug ]] && echo "... already set" continue fi #rm -fv "${target}/${alias_latest}" [[ -n $opt_debug ]] && echo "... linking to $latest_path" ln -sfvn "${latest_name}" "${target}/${alias_latest}" fi done ```

My default cron-job looks like btrbk -c /etc/btrbk/btrbk.conf run cron && update_snapshotlinks -c /etc/btrbk/btrbk.conf /mnt/local/data/archive/*/*

However remote targets are currently not managed by this: For instance I also send snapshots to remote ssh targets which I don't re-link with this script. Local cron-jobs there do the re-linking.

I still think, btrbk could be easily set the symlink after every successfull snapshot creating on whatever target.

jvgreenaway commented 4 months ago

@Massimo-B this script looks great! I think I will give it a go.

To add my own bit of colour to the debate of if this should be part of btrbk…

I am currently working out if I will use this tool for my scheduled backups or use buttervolume. Having a latest directory so that I can rsync that to a remote is needed for either of my avenues. It would be cool if btrfs-progs offered this support so that both of these tools could have this functionality.