Open andrewchambers opened 2 years ago
I just saw something similar. Running bupstash get -r /nodump/falsifian/repo/bupstash/pri -k /home/falsifian/lib/secret/key/bupstash/pri0 --pick home/falsifian/w/misc-pri/n id=7c7e8ea84a30f889ef6fa27bbbb8d4c7
, I see the message:
bupstash get: entry 0 content hash differs from index hash, possible corruption detected.
The output is way smaller than I expect: 1.3K vs 3.6M when I archive the original directory with the tar
command.
Before running into this error, I experienced another weird thing. Trying to recover an individual file, I got output that was exactly the right length, but had the wrong content. The content appeared to be pieces of other files concatenated together, starting partway through one file, then including two other complete files, ending exactly on a file boundary. The files were all from the same directory as the file I wanted. Command I used was bupstash get -r /nodump/falsifian/repo/bupstash/pri -k /home/falsifian/lib/secret/key/bupstash/pri0 --pick home/falsifian/w/misc-pri/n/ssh_keys id=7c7e8ea84a30f889ef6fa27bbbb8d4c7
More notes:
.../misc-pri/n
) contains 2722 small text files.bupstash put
as a separate user in order to avoid accidentally archiving certain files like private keys, but then running bupstash get
as my normal user. Also, I created the archive using a subkey that only has put permissions.Thank you so much for the report - Your setup sounds fine.
A few questions:
Thank you so much for the report - Your setup sounds fine.
A few questions:
* Are you able to recover the file using a full 'get' of the snapshot?
Probably not. I just tried: bupstash get -r /nodump/falsifian/repo/bupstash/pri -k /home/falsifian/lib/secret/key/bupstash/pri0 id=7c7e8ea84a30f889ef6fa27bbbb8d4c7 > bs.tar
, and when I tried extracting the tarball, got:
moth d $ tar -xf ../bs.tar
tar: Invalid header, starting valid header search.
tar: Invalid header, starting valid header search.
tar: Unable to remove directory home/falsifian/.cabal/logs/ghc-8.10.5/: Directory not empty
tar: Invalid header, starting valid header search.
tar: Unable to remove directory home/falsifian/.cabal/logs/ghc-8.10.5/: Directory not empty
tar: Invalid header, starting valid header search.
and more like the above. (I terminated tar instead of waiting to see the end result; I can do that if it would help.)
* Are you able to reproduce the error reliably?
I'll try.
* Did you install bupstash from an openbsd package repository? I am curious if they applied any patches.
Yes. If I'm correctly interpreting the PATCHFILES
lines in the Makefile, they're cherry-picking commits a73fd83, 8bf5bd1 and 290768c onto the v0.12.0
tag. (Seeing the comment about packporting a --ignore-permission-errors
fix, I remembered a weird thing, in case it's relevant: while bupstash put
was running, the progress indicator froze on a directory I didn't have permission to access.)
Probably not. I just tried:
bupstash get -r /nodump/falsifian/repo/bupstash/pri -k /home/falsifian/lib/secret/key/bupstash/pri0 id=7c7e8ea84a30f889ef6fa27bbbb8d4c7 > bs.tar
, and when I tried extracting the tarball, got:
I stand corrected. I just reran the tar command to see what happens if I let it finish extracting. That whole directory was restored correctly.
Another kind of error that appeared while tar was running:
tar: A hard link to the directory home/falsifian/src/darcs/2.16.2/_darcs/pristine.hashed/ is not allowed
I ran a diff between my home directory within what tar extracted and my original home dir*. The only output I see is:
File ./.cabal/bin/n is a directory while file /home/falsifian/.cabal/bin/n is a regular file
.diff
is trying to follow to files that don't exist.Binary files ./var/pri/bupstash/sendlog and /home/falsifian/var/pri/bupstash/sendlog differ
*I actually backed up /
but only looked at /home/falsifian
within the backup for this diff.
The permission error patch might be a potential cause, I will make a new release a priority so people don't need to rely on the custom patches.
There is also a chance something else is going on that is specific to openbsd. I will try to introduce more random directory generation and fuzzing to try and reproduce it on my end.
Are you able to recover the file using a full 'get' of the snapshot?
I stand even more corrected. I tried extracting with GNU tar instead of OpenBSD's tar, and it seems to work fine*. So, the only bugs I can confirm for now are: (a) --pick
giving the incorrect, stitched-together output, and (b) incompatibility of bupstash get
output with OpenBSD's tar.
Some more info, in case it helps:
bupstash put
, but it failed with permission errors, after making some progress. Then I created two successful backups, and it was the second one that I was looking at. I also ran bupstash gc
at some point.*(Details: GNU tar extracts without any errors (except it failed to create some files in directories with mode 555), and I don't see missing files. I did not look too deeply this time; I just looked at a part of the file tree that had missing files when I used BSD tar.)
I am the original reported of the problem (on the public chat).
I just want to add that I am still running bupstash on OpenBSD, on several hosts, several times per day (it is a total of about 27 run per day). It is done with bupstash put
followed by checking the backup (bupstash extraction + tarball check) with bupstash get | gtar ft
.
Currently, the problem didn't occur again, so it seems to popup at very low rate.
Interesting that this was openbsd as well, I can maybe do my fuzzing in a vm there.
I've verified that this is reproducible with my own private data, but haven't yet managed to see the error with synthetic data.
I tried making a synthetic directory structure with thousands of files and some of them not readable, but wasn't able to trigger the bug; if anyone has ideas about what properties of the synthetic files might be important (e.g. large/small size?) I can keep trying.
Summary of what follows:
To be precise, I've included a script below, intended only to be run on my computer. The exclude_args
are just there to speed things up.
The three bupstash get
commands at the end of the script behave badly in three different ways:
bupstash get: entry 0 content hash differs from index hash, possible corruption detected.
as reported above.<unknown>
s.
thread 'main' panicked at 'slice index starts at 282688 but ends at 453', src/client.rs:683:34
stack backtrace:
0: 0x1a1fdcef10d - <unknown>
1: 0x1a1fdc32aba - <unknown>
2: 0x1a1fdcda554 - <unknown>
3: 0x1a1fdceeeea - <unknown>
4: 0x1a1fdcea4d5 - <unknown>
5: 0x1a1fdceafb1 - <unknown>
6: 0x1a1fdcef4d7 - <unknown>
7: 0x1a1fdcef430 - <unknown>
8: 0x1a1fdcea904 - <unknown>
9: 0x1a1fdc34f33 - <unknown>
10: 0x1a1fdc31186 - <unknown>
11: 0x1a1fdc31119 - <unknown>
12: 0x1a1fdb6309d - <unknown>
13: 0x1a1fdb5d258 - <unknown>
14: 0x1a1fdbf6d00 - <unknown>
15: 0x1a1fdb0c046 - <unknown>
16: 0x1a1fdc0321b - <unknown>
17: 0x1a1fdacf342 - <unknown>
Abort trap (core dumped)
Some notes:
bupstash get
content for the ssh_keys
file. But if I change things a bit (I think I was just editing exclude_args
) the output changes --- still exactly the same length as ~/w/misc-pri/n/ssh_keys
but different (still wrong) content.~/w/misc-pri/n
and got similar behaviour as ssh_keys
(wrong content). I tried other dotfiles in my home dir and got similar behaviour (crashing) as with ~/.shrc
. So I wonder if the different behaviours end up associated with particular directories, or in the order the files were visited in./home/falsifian
instead of /
the error goes away; this is what makes me think it's related to permission errors.ssh_keys
.#!/bin/sh
set -e
TMP=$(mktemp -d "$HOME/bs_tmp/bs-XXXXXXXX")
printf 'Using temp dir: %s\n' "$TMP"
cd "$TMP"
bupstash init -r repo
bupstash new-key -o key
exclude_args="--exclude /dev --exclude /mnt --exclude /nodump --exclude /tmp --exclude /usr --exclude /vm --exclude $HOME/.cabal --exclude $HOME/.cache --exclude $HOME/.local --exclude $HOME/.mozilla --exclude $HOME/bs_tmp --exclude $HOME/co --exclude $HOME/src --exclude $HOME/var --exclude _darcs"
id=$(bupstash put -k key -r repo --no-send-log --ignore-permission-errors $exclude_args /)
printf 'Id: %s\n' "$id"
bupstash get -k key -r repo --pick home/falsifian/w/misc-pri/n/ssh_keys "id=$id" > result
cmp result ~/w/misc-pri/n/ssh_keys || echo wrong content
bupstash get -k key -r repo --pick home/falsifian/w/misc-pri/n "id=$id" > n.tar || echo extract dir failed
RUST_BACKTRACE=full bupstash get -k key -r repo --pick home/falsifian/.shrc "id=$id"
@bket FYI (assuming you're the same bket
as the OpenBSD port maintainer; sorry if I guesed wrong)
Really appreciate the investigation.
@falsifian what is the OpenBSD version you use ? OpenBSD 7.3 ? -current ? what is the output of sysctl -n kern.version
.
are you able to rebuild the sysutils/bupstash
port with some changes ? (I would send you a patch to keep the debuginfo and eventually some port changes to test)
are you able to build bupstash
HEAD from git repository and test with it ?
please note that the first error reported was without permission errors cherry-picked commit.
Cc @klemensn
@falsifian what is the OpenBSD version you use ? OpenBSD 7.3 ? -current ? what is the output of
sysctl -n kern.version
.
moth ~ $ sysctl -n kern.version
OpenBSD 7.3-current (GENERIC.MP) #1162: Sun Apr 23 17:40:19 MDT 2023
deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
(Just ran sysupgrade
last night; the above script still produces the same errors.)
are you able to rebuild the
sysutils/bupstash
port with some changes ? (I would send you a patch to keep the debuginfo and eventually some port changes to test)are you able to build
bupstash
HEAD from git repository and test with it ?
Sure, thanks, I can build ports and hopefully building directly isn't too hard. Will try to find some time to test at HEAD. I might first try to narrow down the part of my filesystem needed to reproduce this though; would be nice to have an example I can share.
please note that the first error reported was without permission errors cherry-picked commit.
Cc @klemensn
Here's a complete reproducible example, showing a slightly different buggy behaviour. Hopefully it has the same cause as everything else here...
The below script ought to print z content
because that's what was written to the file fs/var/z
, but it prints acme-client.conf content
instead.
(I call this behaviour "different" because previously the file always had exactly the right length if bupstash didn't report an error, but fs/var/z
and fs/etc/examples/acme-client.conf
have different lengths.)
#!/bin/sh
set -e
mkdir bug;cd bug
mkdir -p fs/etc/examples fs/var
echo bgpd.conf content > fs/etc/examples/bgpd.conf
echo acme-client.conf content > fs/etc/examples/acme-client.conf
echo z content > fs/var/z
chmod 000 fs/etc/examples/bgpd.conf
bupstash init -r repo
bupstash new-key -o key
id=$(bupstash put -k key -r repo --no-send-log --ignore-permission-errors fs)
printf 'Id: %s\n' "$id"
bupstash get -k key -r repo --pick var/z "id=$id"
thanks. I am able to reproduce.
with your reproducer, I have the following output (I added set -x
):
$ sh ./test.sh
+ mkdir bug
+ cd bug
+ mkdir -p fs/etc/examples fs/var
+ echo bgpd.conf content
+ > fs/etc/examples/bgpd.conf
+ echo acme-client.conf content
+ > fs/etc/examples/acme-client.conf
+ echo z content
+ > fs/var/z
+ chmod 000 fs/etc/examples/bgpd.conf
+ bupstash init -r repo
+ bupstash new-key -o key
+ bupstash put -k key -r repo --no-send-log --ignore-permission-errors fs
+ id=91333d575284d44c4c616e85d1bd98b4
+ printf Id: %s\n 91333d575284d44c4c616e85d1bd98b4
Id: 91333d575284d44c4c616e85d1bd98b4
+ bupstash get -k key -r repo --pick var/z id=91333d575284d44c4c616e85d1bd98b4
acme-client.conf content
the expected output of bupstash get --pick var/z
is z content
.
I bisected bupstash get --pick
problem to 5be2c885b5c4d6904291a7d2d6e406421b90d38f (as first bad commit).
it seems related to --pick
usage: when using bupstash get
(without --pick) and extracting the tarball, you get the right file.
I am think there is several problems here. bupstash get --pick
is only one of them.
@semarie nice work with the bisection, very useful information.
I have tracked down the issue, when a file could not be opened we removed it from the list of file entries which broke the internal pick offset tracking in a subtle way.
The simplest fix for now is if stat works on a file, but it can't be opened will just be backed up as an empty file.
This should be fixed before going to debian
This should be fixed before going to debian
@bastien-roucaries FYI #384 seems to fix it. I don't know if @andrewchambers wanted to do any further review or testing before merging it.
Yeah, the main blocker is just I had been very busy for the past 3-4 months at a new job.
The fix for this has been merged but it not in the latest release.
From the public chat:
This issue is interesting because the expected symptom of disk or ram corruption would be a different error such a corrupt chunk error. This error is triggered during the second set of checksum checks - which suggests either a bug in bupstash or some other problem - it definitely requires further investigation.