fcorbelli / zpaqfranz

Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix
MIT License
259 stars 22 forks source link

Backup with -test fails when -replace or -to is used #112

Closed sheckandar closed 2 months ago

sheckandar commented 2 months ago

Hi again.

I've ran into another bug. When using backup command together with -find "string1" -replace "string2", the verification at the end fails due to the fact that the archive contains paths modified with -replace.

Example:

sudo zpaqfranz backup "/mnt/b2/print$" "/volume1/snapshots/print$" -find "/volume1/snapshots" -replace "/volume1" -verbose -index "/volume1/NetBackup" -test

In this case everything goes well, but then verification fails at the end.

00000193 +external (file missing in ZPAQ)

sheckandar commented 2 months ago

After further testing, looks like -find and -replace do not do anything when used with backup command.

But, the -test switch should work either way, IMHO.

fcorbelli commented 2 months ago

find/replace seems to work right

zpaqfranz backup z:\prova *.cpp -find ".cpp" -replace ".pip"

then

C:\zpaqfranz>zpaqfranz l z:\prova_????????
zpaqfranz v60.1b-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-06-25)
franz:-hw

z:/prova_????????.zpaq:
1 versions, 25 files, 1.967.036 bytes (1.88 MB)

   Date      Time         Size Ratio Name
---------- -------- ---------- ----- -----
2024-06-25 20:12:48  3.643.997   10% + 01.pip
2024-06-21 19:17:45  3.605.270   13% + 02.pip
2024-01-04 14:54:58        249   17% + 02_quest.pip
2024-06-15 11:47:45  3.617.549   11% + 08_1.pip
2024-06-21 18:59:06  3.629.794   10% + 59_9b.pip
2024-06-15 19:51:32  3.621.407   10% + a1.pip
2024-05-27 19:28:16  3.612.550   13% + andiamo.pip
2024-06-21 18:39:27  3.629.794   10% + datestare.pip
2024-05-27 18:11:49  3.612.549   13% + i_01.pip
fcorbelli commented 2 months ago

If you want to change the stored path, you should use -to

-find and -replace are for restoring (or checking) files

C:\zpaqfranz>zpaqfranz backup z:\thebackup c:\zpaqfranz -to k:\newfolder
zpaqfranz v60.1b-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-06-25)
franz:-to                   <<k:/newfolder>>
franz:-hw
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
part0 z:/thebackup_00000000.zpaq i_filename z:/thebackup_????????.zpaq
Multipart backup seems OK
part0 z:/thebackup_00000000.zpaq i_filename z:/thebackup_????????.zpaq
Creating z:/thebackup_00000001.zpaq at offset 0 + 0
(...)

C:\zpaqfranz>zpaqfranz l z:\thebackup_????????
zpaqfranz v60.1b-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-06-25)
franz:-hw

z:/thebackup_????????.zpaq:
1 versions, 5.101 files, 1.662.998.916 bytes (1.55 GB)

   Date      Time           Size Ratio Name
---------- -------- ------------ ----- -----
2024-06-25 21:08:48            0   dir # k:/newfolder/
2024-04-30 18:45:59            6    0% + k:/newfolder/._ciao.txt
2023-10-26 20:00:35            0   dir # k:/newfolder/.github/
2023-10-26 20:00:35            0   dir # k:/newfolder/.github/workflows/
2023-03-04 23:34:50          844   29% + k:/newfolder/.github/workflows/github_actions_build.yml
2023-03-04 23:34:50          427   33% + k:/newfolder/.gitignore
2023-03-04 23:34:50       12.618   29% + k:/newfolder/.travis.yml
2024-06-25 14:43:13           52   50% + k:/newfolder/0.bat
2023-10-26 20:00:35            0   dir # k:/newfolder/00000001/
2022-09-08 11:25:30          798   29% + k:/newfolder/00000001/1.txt
2022-08-25 16:50:14       29.581   29% + k:/newfolder/00000001/3.txt
2022-08-25 16:50:23       22.307   29% + k:/newfolder/00000001/4.txt
2022-08-12 16:33:28       13.050   29% + k:/newfolder/00000001/cpuz.txt
(...)
sheckandar commented 2 months ago

I would still consider this a bug, because it exits with an error code where it shouldn't. The backup is actually fine and all the tests return ok.

image

image

image

fcorbelli commented 2 months ago

Three completely different things

Backup command (or add): -find/-replace (-to) change the source path to something else Aka the file c:\pippo\something will become d:\backup\something or whatever.

The testbackup command, without the -paranoid switch, operates not on the contents of the archives, but on the files themselves. So it checks for "holes" (i.e., missing archive files) In other words, that there has been no subsequent corruption, e.g. by replacing the file foo_000001.zpaq with something else With the -verify the -find/-replace (and -to) will change the stored paths to something else, aka something that exists on the filesystem WHERE THE BACKUP (aka the backup_00000001.zpaq, backup_0000000002.zpaq files) are. Almost never useful

The test command, with a parameter compare SHA-1 fragments (of the archived files) with the SHA-1 "pieces" (of the filesystem). In this case -find/-replace (and -to) is used to "translate" the stored paths with the filesystem paths WHERE THE SOURCE files are Turning d:\backup\something to z:\where_are_my_files_now\

=> you should use THREE different commands with different -find/-replace (-to) 1) zpaqfranz backup -find-replace (or -to) 2) zpaqfranz testbackup -find/-replace 3) zpaqfranz t (...) -find/-replace/-to

Then there is the parameterless t (test) command, which still does other things Then there is t (test) with -paranoid (which does other things) And then the v (verify) command and then w

=> The -test switch operates "in a chain," that is, it runs a test after finishing the backup. So all other parameters remain.

Normally, if paths are not "treacherously" changed, there is no need for any -find, -replace, -to

Normally they are used instead in the case of zfs snapshots, to unify the paths

fcorbelli commented 2 months ago

The -paranoid switch (in testbackup) will check that the filenumber and filesize (inside the zpaq's index archive) is == the filenumber and filesize (from the chunked pieces) This require the password (if any) and test for wrong indexes file

Example You make a backup named pippo You make another backup named pluto Then you substitute the pippo's index file with pluto index

When archive files are separated they can be "shuffled," corrupted, deleted etc. This generates a whole series of problems

When the file (as by default) is one, this does not happen So, in general, the multipart format is preferred if you know what you are doing; you want to use a remote storage system (e.g., copying with rsync); and you want a quick way to check the integrity of the upload, done day by day (i.e., on the last part, which is normally much smaller than the total)

Translation Suppose you want to backup a 100GB fileserver, and send it with rsync remotely, for protection from ransomware etc. During the first run you will generate a local file say 100GB in size Suppose that, every day, you will add 1GB (this is just an example) of data, and that you will make a daily backup

With a monolithic file you will have a 100GB .zpaq today. You will send it with rsync in a couple of days remotely.

Tomorrow you will have a 101GB zpaq local file You will send it remotely with the rsync command but with the switch --append, effectively sending 1GB in a few minutes (BTW this is antiransomware, because --append and NO --delete)

After that (tomorrow) you will get a 102GB zpaq local file And with rsync --append you'll still send 1GB etc

fcorbelli commented 2 months ago

The problem arises when you want to check that your local 102GB file is identical to the remote 102GB file You can do this in several ways One is rsync without --append. This (behind the scenes) will calculate the MD5 in local and remote chunks, and compare them. That's ~204GB, though, and load the remote system quite a bit.

If you use a multipart archive instead, you will have (in our example) a first 100GB file, then a 1GB file, and a 1GB file (one for day) You can check that the 1GB files are not corrupted (aka: local == remote), by comparing their MD5 hash codes much faster than in the single-archive situation ~2GB instead of ~204GB, with minimal load.

So you will do a loop like this meta-script prepare the local update => ship to remote calculate the remote MD5 code, get it, compare it with the local one (if you use the backup command it is already inside the index file) run a "heavy" local test

BTW there are last and last2 commands, in zpaqfranz, to do... exactly this duty

Basically for checking the REMOTE copy you will use a hash comparison (with the local one) While for LOCAL checking you will use much more computationally "heavy" "unpacking"

If the LOCAL stored-archive is correct (aka: unpackable), and the REMOTE's hash match the local one, then by the transitive property the REMOTE copy is correct

In short, these are mechanisms for handling verification of large-scale (hundreds of GB) remote copies, even on anemic systems (Atom, VPS etc)

fcorbelli commented 2 months ago

The problem that needs to be addressed is the possibility that the remote copy was not properly uploaded (due to a problem during upload), or that it was modified later. Without, of course, being able to compare local and remote. In the case of local backups, i.e., on a LAN, zpaqfranz has the relevant r (robocopy) command and cp with the -append and -verify switches. They are used, typically, for ESXi servers => mount NFS or NAS.

sheckandar commented 2 months ago

Thank you for all the information. it was very helpful. Some of it I was aware of and some is new to me.

If I understand you correctly, you are suggesting that the error may be related to the fact that the archive that is generated is somehow corrupt. However, after running all available test commands you listed and using all available switches, the software always returns (all OK). Also, restoring the files and then comparing hashes shows that the archive was never corrupt, but rather the error code displayed when using -test switch is erroneous.

Basically, this error prevents me from using the -test switch in a script. Because what happens then is the script detects non 0 exit code and stops processing further commands, sends a failure notification to me which then forces me to spend time troubleshooting, but since the archive is actually fine, it ends up being a total waste of time.

zpaqfranz backup "/mnt/b2/print$" "/volume1/snapshots/print$" -to "/volume1/print$" -index "/volume1/NetBackup" -test Fails with non 0 exit code

zpaqfranz backup "/mnt/b2/print$" "/volume1/snapshots/print$" -to "/volume1/print$" -index "/volume1/NetBackup" (all OK)

zpaqfranz backup "/mnt/b2/print$" "/volume1/snapshots/print$" -index "/volume1/NetBackup" -test (all OK)

fcorbelli commented 2 months ago

If I understand correctly you want string manipulation in the test function as well. I can do that, although it slows down a bit

fcorbelli commented 2 months ago

60_1g.zip You can try the attached pre-release NOTE: 60 use a different default archive format, testing is underway, not completed Let's suppose to do

zpaqfranz a z:\1.zpaq c:\ut -to k:\fake\ut

Then this is NOT good, of course

zpaqfranz t z:\1.zpaq c:\ut

BUT now you can use -to

zpaqfranz t z:\1.zpaq c:\ut -to k:\fake\ut

or "low level" find/replace

zpaqfranz t z:\1.zpaq c:\ut -find "k:/fake/ut" -replace "c:/ut" -verify
fcorbelli commented 2 months ago

I think it is good to re-explain that the t command with one (or more) paths does NOT perform a decompression test of the archive

But a comparison of the tree from the filesystem with the files inside the archive If there is a PLUTO folder in the c:\nz path, even an empty one, and in the .zpaq archive there is not, the t command will fail. Similarly, if there is an empty PAPERINO folder in the zpaq archive, and c:\nz\PAPERINO does not exist, the test will fail

So the first step of the test is lexicographic, that is, the file names (from the filesystem) must be == to those in the archive In other words, there must be no files (in the filesystem) that are not in the archive and vice versa (less important)

The second step is to check the SHA-1 hash blocks between the files (in the filesystem) and those in the archive. In this step the .zpaq file is NOT expanded or processed in any way. There is NO guarantee that the files are actually "decompressable". Processing speed can be high, or low, depending on various circumstances (CPU speed and mass media)

The third (optional) step occurs if the -checksum (or -verify or -paranoid) switch is used. In this case, the hash (and CRC-32) of the files in the filesystem is calculated and compared with those stored in the archive. Again there is no guarantee that the files in the archive are "extractable," because this processing is simply NOT done. The data is NOT decompressed

fcorbelli commented 2 months ago

The command that actually decompresses the data, simulating an extraction, is t WITHOUT one or more folders.

zpaqfranz t z:\1.zpaq 

Now you know that you can decompress the data (or at least no errors are detected) You can add an additional layer, that is, of comparison with the contents of the filesystem, with the switch -verify

zpaqfranz t z:\1.zpaq -verify -find "k:/fake/ut" -replace "c:/ut" -ssd

In this scenario you first have the decompressibility check (and recalculated CRC-32s) and then the filesystem match check

If you want to have a higher level of security (!) you will have to write down a data extraction. However, this can reduce the life of the mass medium (*normally I use ramdisk or deduplicated zfs filesystems, if you want details) and takes longer

zpaqfranz t z:\1.zpaq -to z:\temporaneo -paranoid

If you do not want (or can) write, but you have plenty of RAM (aka: the largest decompressed file in the archive must be smaller than available RAM -10%) there is the w command

Finally there is the paranoid command (!)

fcorbelli commented 2 months ago
zpaqfranz backup "/mnt/b2/print$" "/volume1/snapshots/print$" -index "/volume1/NetBackup" -test

Here you have a FILE ("/volume1/snapshots/print$") than the t with parameter (SHA1-chunked) will be called This is why I generally recommend using TWO separate commands, one to do the archiving, and the second for verification, with potentially different switches

sheckandar commented 2 months ago

The 60 version is not working on the Synology where I need to test it. Looks like it now requires libc.so and the versions do not match between my development machine and the Synology.

./zpaqfranz: /lib64/libc.so.6: version 'GLIBC_2.28' not found (required by ./zpaqfranz)

This is going to break portability for Linux users, unless you force GLIBC 2.2.5.

Basically, reliance on glibc makes it so a development machine and a production machines have to either be the same glibc version or a production machine has to be newer than a development machine.

fcorbelli commented 2 months ago

??? There is the source code

sheckandar commented 2 months ago

??? There is the source code

Sorry, what do you mean ?

fcorbelli commented 2 months ago

You can compile yourself the program

sheckandar commented 2 months ago

I did. I compiled on a RedHat machine and ran on a Synology machine and that was the error.

In the latest version you are using functions that rely on GLIBC library. My RedHat machine has GLIBC 2.28 which becomes the required version of zpaqfranz after it is compiled, however, the Synology machine has GLIBC 2.26 and that's what causes the error.

fcorbelli commented 2 months ago

Sorry, I do not have "real" Synology I tried to find some, without luck, on Synology forum

sheckandar commented 2 months ago

It doesn't have to be Synology. It just has a Linux OS under the hood.

To test it, you need 2 Linux OS, one newer, one older. 2 versions of Ubuntu will do (or any Linux OS you prefer)

In a command prompt type in /lib64/libc.so.6 and that will give you the version of GLIBC you have on each OS.

Compile zpaqfranz on a newer version and then run it on the older one and you will get that error.

sheckandar commented 2 months ago

As for Synology, you can run one in a container for testing purposes.

https://github.com/vdsm/virtual-dsm

fcorbelli commented 2 months ago

What is the latest version you can run on synology?

sheckandar commented 2 months ago

Of zpaqfranz ?

59.9 was running fine.

Whatever changes you made in version 60 now require GLIBC library.

fcorbelli commented 2 months ago

I struggle to understand what the problem is. On Synology (arm) I use a statically compiled version. There is no dynamic library (except for downloading from the internet, but no one cares) If you ldd you'll see exactly the same libraries

sheckandar commented 2 months ago

Our Synology is Intel.

I used the Make file you provided. I can try compiling statically if you could provide the command I need to run.

sheckandar commented 2 months ago

I ran LD_DEBUG=bindings ./zpaqfranz and couldn't find anything odd.

I will post if I find anything linked to GLIBC version. Debugging this may take a while.

fcorbelli commented 2 months ago

arm

This is the very last zpaqfranz cross-compiled (on Debian 11) for arm, statically linked running on a physical Annapurna-powered (arm) QNAP NAS

Please try the attached (just a quick workaroud) test_ancient.zip

with

g++ -O3 -DANCIENT zpaqfranz.cpp -o zpaqfranz -static -pthread -s
fcorbelli commented 2 months ago

This is an Intel Synology DSM 7.2 running a Debian-11 (static) compiled latest executable Compiled with nothing odd

g++ -O3 -Dunix zpaqfranz.cpp -pthread -lstdc++ -lm -o zpaqfranz_linux64 -t

linux zpaqfranz_linux64.zip

Please le me know if this runs on your physical Synology

sheckandar commented 2 months ago

DSM 7.2 has GLIBC 2.36 which will work fine. I was using it on DSM 7.1 which has GLIBC 2.26.

After running objdump -T zpaqfranz | grep GLIBC I found what function is causing the issue: statx()

Version 59.9 doesn't have that function.

Edit:

From further tests that I ran it looks like that statx() is not present in GLIBC lower than 2.28. This means version 60 won't run on older systems.

This command will give you all GLIBC versions required for an elf executable:

nm --dynamic --undefined-only --with-symbol-versions zpaqfranz | grep GLIBC_2

sheckandar commented 2 months ago

I will try to explain what happened here.

Basically GLIBC is part of the OS and each distribution version comes with a version of GLIBC which doesn't change.

See here: https://gist.github.com/richardlau/6a01d7829cc33ddab35269dacc127680

For example my RedHat 8 has GLIBC 2.28 and Synology DSM 7.1 has GLIBC 2.26

In order for you to get that error, you would have to compile on Debian 11, but run the file on Debian 9.

So GLIBC is not reverse compatible, meaning you cannot run an app that was compiled with GLIBC 2.28 on any distro where GLIBC < 2.28. Otherwise you get an error as I posed above. One note though, this rule only applies if an app uses GLIBC functions of newer version, otherwise, it would make no difference as with version 59.9 of zpaqfranz which mostly uses GLIBC 2.2.5 and 2.3.

So this issue will come up with any user who has a production system where GLIBC version is lower than the development system. I understand that a lot of users will run zpaqfranz on the same machine where it was compiled, but it is not the case for companies with IT departments or even Devs with a lot of different code.

In any case, the GLIBC issue is not a bug, but by design. I will try to see if I can find a workaround. I think I have code with examples how to force GLIBC to version 2.2.5 (universally compatible with all modern Linux systems, old and new)

Edit: So I found the code I needed: __asm__(".symver statx, statx@GLIBC_2.2.5");

However, as stated above, this won't work as statx() was introduced in GLIBC 2.28.

sheckandar commented 2 months ago

This is an Intel Synology DSM 7.2 running a Debian-11 (static) compiled latest executable Compiled with nothing odd

g++ -O3 -Dunix zpaqfranz.cpp -pthread -lstdc++ -lm -o zpaqfranz_linux64 -t

Please le me know if this runs on your physical Synology

It doesn't work for the reasons explained above. I get the same error.

sheckandar commented 2 months ago

NOTE: 60 use a different default archive format, testing is underway, not completed

What does that mean ? Is it still zpaq archive format ?

sheckandar commented 2 months ago

Would you be able to update version 59.9 to resolve the -test switch ?

fcorbelli commented 2 months ago

NOTE: 60 use a different default archive format, testing is underway, not completed

What does that mean ? Is it still zpaq archive format ?

It is newest zpaqfranz archive format zpaqfranz before 60 cannot test archive done with 60+, unless a older switch is used (for example -xxhash or -blake3)

fcorbelli commented 2 months ago

Would you be able to update version 59.9 to resolve the -test switch ?

I would say no, but I can make 60 executable even for old systems I reiterate that switch -test I don't think does what you want it to do You should use the t command

sheckandar commented 2 months ago

Would you be able to update version 59.9 to resolve the -test switch ?

I would say no, but I can make 60 executable even for old systems I reiterate that switch -test I don't think does what you want it to do You should use the t command

Well, it doesn't even matter what command I use or if I understand how it works. I simply reported a bug.

A function should only throw an error if an error was detected. But, an error is thrown where there is no error.

It is up to you if you want to fix the reported issue or not.

fcorbelli commented 2 months ago

Well, it doesn't even matter what command I use or if I understand how it works. I simply reported a bug.

A function should only throw an error if an error was detected. But, an error is thrown where there is no error.

It is up to you if you want to fix the reported issue or not.

I explained why altering path storage generates an error (before version 60). If you do not alter them, no error is reported. error

Since the alteration is completely up to the user, it is up to you to figure out what you are doing and why. If you are troubled by this you can alternatively

sheckandar commented 2 months ago

Ok.

I appreciate the time you spent on this issue.

I will wait for version 60 that works on older systems and is production ready. I can't use alpha releases for anything other than testing.

fcorbelli commented 2 months ago

60_1k.zip You can try this pre-release

g++ -O3 -Dunix -DNAS zpaqfranz.cpp -o zpaqfranz_nas -pthread -static -s
sheckandar commented 2 months ago

Seems to work. I did an extraction and all the files were extracted without error.

I did get an error when compiling that relates to update function, but that's very minor.

when compiling:

/tmp/ccl2VyEw.o: In function `downloadfile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) [clone .part.568]':
zpaqfranz.cpp:(.text+0x3a95e): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

when running zpaqfranz update:

zpaqfranz v60.1k-NAS-L(2024-07-01)
Checking internet update (-verbose for details)
zpaqfranz: dl-call-libc-early-init.c:37: _dl_call_libc_early_init: Assertion `sym != NULL' failed.
Aborted (core dumped)
sheckandar commented 2 months ago

Backed up with -test switch then verified with t command using different switches. All commands returned (all OK).

The issue is resolved.

Let me know if you want me to test the update function later.

sheckandar commented 2 months ago

I think I found another bug related to backup command and -to switch.

Basically if I run this command multiple times, it duplicates the number of files in the index file:

./zpaqfranz backup "/volume1/NetBackup/test" "/volume1/NetBackup/test" -to "/volume1/print\$" -index "/volume1/NetBackup/Scripts"

Result: 188 +added, 0 -removed.

Then I ran the same command again and it shows that another 188 files were added with the same output, but the actual archive size seems to suggest that wasn't the case.

The first archive test_00000001.zpaq is 39.1 MB in size - that's correct. The second archive test_00000002.zpaq is 9.4 KB in size - looks right as no new files were actually added

Then I ran ./zpaqfranz l "/volume1/NetBackup/test_0000000?.zpaq"

Result:

/volume1/NetBackup/test_0000000?.zpaq:
2 versions, 376 files, 40.985.571 bytes (39.09 MB) <-- wrong file count

This doesn't happen when I omit the -to switch.

Ran the following command 2 times:

./zpaqfranz backup "/volume1/NetBackup/test" "/volume1/NetBackup/test" -index "/volume1/NetBackup/Scripts"

Then

./zpaqfranz l "/volume1/NetBackup/test_0000000?.zpaq"

Result:

/volume1/NetBackup/test_0000000?.zpaq:
2 versions, 188 files, 40.976.212 bytes (39.08 MB) <-- correct number of files
fcorbelli commented 2 months ago

Do you have any .xls or .ppt file? PS it is test_???????? (eight ?)

fcorbelli commented 2 months ago

Found. It is indeed a bug, albeit a modest one (the file gets bigger). You need to change this line to become like this (appending || command=='Z')

const bool i_renamed=command=='l' || command=='a' || command=='5' || command=='Z'; ///5 for dirsize arrggghh hidden parameter!
sheckandar commented 2 months ago

testbackup command (all OK) t -verify with -to switch (all OK) t with -to switch, but without -verify (all OK) list command has proper sizes and file lists

Everything is working in relation to this bug.

Much appreciated. It is now fully resolved.

I will wait for a prod release of version 60.