andrewchambers / bupstash

Easy and efficient encrypted backups.
https://bupstash.io
MIT License
897 stars 31 forks source link

Possible corruption detected #357

Open andrewchambers opened 2 years ago

andrewchambers commented 2 years ago

From the public chat:

after bupstash put, I am running a bupstash get to check the data. Here, I am getting the following error: bupstash get: entry path/to/some/file content hash differs from index hash, possible corruption detected . 

This issue is interesting because the expected symptom of disk or ram corruption would be a different error such a corrupt chunk error. This error is triggered during the second set of checksum checks - which suggests either a bug in bupstash or some other problem - it definitely requires further investigation.

falsifian commented 1 year ago

I just saw something similar. Running bupstash get -r /nodump/falsifian/repo/bupstash/pri -k /home/falsifian/lib/secret/key/bupstash/pri0 --pick home/falsifian/w/misc-pri/n id=7c7e8ea84a30f889ef6fa27bbbb8d4c7, I see the message:

bupstash get: entry 0 content hash differs from index hash, possible corruption detected.

The output is way smaller than I expect: 1.3K vs 3.6M when I archive the original directory with the tar command.

Before running into this error, I experienced another weird thing. Trying to recover an individual file, I got output that was exactly the right length, but had the wrong content. The content appeared to be pieces of other files concatenated together, starting partway through one file, then including two other complete files, ending exactly on a file boundary. The files were all from the same directory as the file I wanted. Command I used was bupstash get -r /nodump/falsifian/repo/bupstash/pri -k /home/falsifian/lib/secret/key/bupstash/pri0 --pick home/falsifian/w/misc-pri/n/ssh_keys id=7c7e8ea84a30f889ef6fa27bbbb8d4c7

More notes:

andrewchambers commented 1 year ago

Thank you so much for the report - Your setup sounds fine.

A few questions:

falsifian commented 1 year ago

Thank you so much for the report - Your setup sounds fine.

A few questions:

* Are you able to recover the file using a full 'get' of the snapshot?

Probably not. I just tried: bupstash get -r /nodump/falsifian/repo/bupstash/pri -k /home/falsifian/lib/secret/key/bupstash/pri0 id=7c7e8ea84a30f889ef6fa27bbbb8d4c7 > bs.tar, and when I tried extracting the tarball, got:

moth d $ tar -xf ../bs.tar
tar: Invalid header, starting valid header search.
tar: Invalid header, starting valid header search.
tar: Unable to remove directory home/falsifian/.cabal/logs/ghc-8.10.5/: Directory not empty
tar: Invalid header, starting valid header search.
tar: Unable to remove directory home/falsifian/.cabal/logs/ghc-8.10.5/: Directory not empty
tar: Invalid header, starting valid header search.

and more like the above. (I terminated tar instead of waiting to see the end result; I can do that if it would help.)

* Are you able to reproduce the error reliably?

I'll try.

* Did you install bupstash from an openbsd package repository? I am curious if they applied any patches.

Yes. If I'm correctly interpreting the PATCHFILES lines in the Makefile, they're cherry-picking commits a73fd83, 8bf5bd1 and 290768c onto the v0.12.0 tag. (Seeing the comment about packporting a --ignore-permission-errors fix, I remembered a weird thing, in case it's relevant: while bupstash put was running, the progress indicator froze on a directory I didn't have permission to access.)

falsifian commented 1 year ago

Probably not. I just tried: bupstash get -r /nodump/falsifian/repo/bupstash/pri -k /home/falsifian/lib/secret/key/bupstash/pri0 id=7c7e8ea84a30f889ef6fa27bbbb8d4c7 > bs.tar, and when I tried extracting the tarball, got:

I stand corrected. I just reran the tar command to see what happens if I let it finish extracting. That whole directory was restored correctly.

Another kind of error that appeared while tar was running:

tar: A hard link to the directory home/falsifian/src/darcs/2.16.2/_darcs/pristine.hashed/ is not allowed
falsifian commented 1 year ago

I ran a diff between my home directory within what tar extracted and my original home dir*. The only output I see is:

*I actually backed up / but only looked at /home/falsifian within the backup for this diff.

andrewchambers commented 1 year ago

The permission error patch might be a potential cause, I will make a new release a priority so people don't need to rely on the custom patches.

There is also a chance something else is going on that is specific to openbsd. I will try to introduce more random directory generation and fuzzing to try and reproduce it on my end.

falsifian commented 1 year ago

Are you able to recover the file using a full 'get' of the snapshot?

I stand even more corrected. I tried extracting with GNU tar instead of OpenBSD's tar, and it seems to work fine*. So, the only bugs I can confirm for now are: (a) --pick giving the incorrect, stitched-together output, and (b) incompatibility of bupstash get output with OpenBSD's tar.

Some more info, in case it helps:

*(Details: GNU tar extracts without any errors (except it failed to create some files in directories with mode 555), and I don't see missing files. I did not look too deeply this time; I just looked at a part of the file tree that had missing files when I used BSD tar.)

semarie commented 1 year ago

I am the original reported of the problem (on the public chat).

I just want to add that I am still running bupstash on OpenBSD, on several hosts, several times per day (it is a total of about 27 run per day). It is done with bupstash put followed by checking the backup (bupstash extraction + tarball check) with bupstash get | gtar ft.

Currently, the problem didn't occur again, so it seems to popup at very low rate.

andrewchambers commented 1 year ago

Interesting that this was openbsd as well, I can maybe do my fuzzing in a vm there.

falsifian commented 1 year ago

I've verified that this is reproducible with my own private data, but haven't yet managed to see the error with synthetic data.

I tried making a synthetic directory structure with thousands of files and some of them not readable, but wasn't able to trigger the bug; if anyone has ideas about what properties of the synthetic files might be important (e.g. large/small size?) I can keep trying.

Summary of what follows:

To be precise, I've included a script below, intended only to be run on my computer. The exclude_args are just there to speed things up.

The three bupstash get commands at the end of the script behave badly in three different ways:

Some notes:

#!/bin/sh
set -e

TMP=$(mktemp -d "$HOME/bs_tmp/bs-XXXXXXXX")
printf 'Using temp dir: %s\n' "$TMP"
cd "$TMP"

bupstash init -r repo
bupstash new-key -o key

exclude_args="--exclude /dev --exclude /mnt --exclude /nodump --exclude /tmp --exclude /usr --exclude /vm --exclude $HOME/.cabal --exclude $HOME/.cache --exclude $HOME/.local --exclude $HOME/.mozilla --exclude $HOME/bs_tmp --exclude $HOME/co --exclude $HOME/src --exclude $HOME/var --exclude _darcs"

id=$(bupstash put -k key -r repo --no-send-log --ignore-permission-errors $exclude_args /)
printf 'Id: %s\n' "$id"

bupstash get -k key -r repo --pick home/falsifian/w/misc-pri/n/ssh_keys "id=$id" > result
cmp result ~/w/misc-pri/n/ssh_keys || echo wrong content

bupstash get -k key -r repo --pick home/falsifian/w/misc-pri/n "id=$id" > n.tar || echo extract dir failed

RUST_BACKTRACE=full bupstash get -k key -r repo --pick home/falsifian/.shrc "id=$id"
falsifian commented 1 year ago

@bket FYI (assuming you're the same bket as the OpenBSD port maintainer; sorry if I guesed wrong)

andrewchambers commented 1 year ago

Really appreciate the investigation.

semarie commented 1 year ago

@falsifian what is the OpenBSD version you use ? OpenBSD 7.3 ? -current ? what is the output of sysctl -n kern.version.

are you able to rebuild the sysutils/bupstash port with some changes ? (I would send you a patch to keep the debuginfo and eventually some port changes to test)

are you able to build bupstash HEAD from git repository and test with it ?

please note that the first error reported was without permission errors cherry-picked commit.

Cc @klemensn

falsifian commented 1 year ago

@falsifian what is the OpenBSD version you use ? OpenBSD 7.3 ? -current ? what is the output of sysctl -n kern.version.

moth ~ $ sysctl -n kern.version
OpenBSD 7.3-current (GENERIC.MP) #1162: Sun Apr 23 17:40:19 MDT 2023
    deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

(Just ran sysupgrade last night; the above script still produces the same errors.)

are you able to rebuild the sysutils/bupstash port with some changes ? (I would send you a patch to keep the debuginfo and eventually some port changes to test)

are you able to build bupstash HEAD from git repository and test with it ?

Sure, thanks, I can build ports and hopefully building directly isn't too hard. Will try to find some time to test at HEAD. I might first try to narrow down the part of my filesystem needed to reproduce this though; would be nice to have an example I can share.

please note that the first error reported was without permission errors cherry-picked commit.

Cc @klemensn

falsifian commented 1 year ago

Here's a complete reproducible example, showing a slightly different buggy behaviour. Hopefully it has the same cause as everything else here...

The below script ought to print z content because that's what was written to the file fs/var/z, but it prints acme-client.conf content instead.

(I call this behaviour "different" because previously the file always had exactly the right length if bupstash didn't report an error, but fs/var/z and fs/etc/examples/acme-client.conf have different lengths.)

#!/bin/sh
set -e

mkdir bug;cd bug

mkdir -p fs/etc/examples fs/var
echo bgpd.conf content > fs/etc/examples/bgpd.conf
echo acme-client.conf content > fs/etc/examples/acme-client.conf
echo z content > fs/var/z
chmod 000 fs/etc/examples/bgpd.conf

bupstash init -r repo
bupstash new-key -o key

id=$(bupstash put -k key -r repo --no-send-log --ignore-permission-errors fs)
printf 'Id: %s\n' "$id"

bupstash get -k key -r repo --pick var/z "id=$id"
semarie commented 1 year ago

thanks. I am able to reproduce.

with your reproducer, I have the following output (I added set -x):

$ sh ./test.sh                                                                              
+ mkdir bug
+ cd bug
+ mkdir -p fs/etc/examples fs/var
+ echo bgpd.conf content
+ > fs/etc/examples/bgpd.conf 
+ echo acme-client.conf content
+ > fs/etc/examples/acme-client.conf 
+ echo z content
+ > fs/var/z 
+ chmod 000 fs/etc/examples/bgpd.conf
+ bupstash init -r repo
+ bupstash new-key -o key
+ bupstash put -k key -r repo --no-send-log --ignore-permission-errors fs
+ id=91333d575284d44c4c616e85d1bd98b4
+ printf Id: %s\n 91333d575284d44c4c616e85d1bd98b4
Id: 91333d575284d44c4c616e85d1bd98b4
+ bupstash get -k key -r repo --pick var/z id=91333d575284d44c4c616e85d1bd98b4
acme-client.conf content

the expected output of bupstash get --pick var/z is z content.

semarie commented 1 year ago

I bisected bupstash get --pick problem to 5be2c885b5c4d6904291a7d2d6e406421b90d38f (as first bad commit).

it seems related to --pick usage: when using bupstash get (without --pick) and extracting the tarball, you get the right file.


I am think there is several problems here. bupstash get --pick is only one of them.

andrewchambers commented 1 year ago

@semarie nice work with the bisection, very useful information.

andrewchambers commented 1 year ago

I have tracked down the issue, when a file could not be opened we removed it from the list of file entries which broke the internal pick offset tracking in a subtle way.

The simplest fix for now is if stat works on a file, but it can't be opened will just be backed up as an empty file.

bastien-roucaries commented 1 year ago

This should be fixed before going to debian

falsifian commented 1 year ago

This should be fixed before going to debian

@bastien-roucaries FYI #384 seems to fix it. I don't know if @andrewchambers wanted to do any further review or testing before merging it.

andrewchambers commented 1 year ago

Yeah, the main blocker is just I had been very busy for the past 3-4 months at a new job.

andrewchambers commented 1 year ago

The fix for this has been merged but it not in the latest release.