Closed sheckandar closed 4 months ago
After further testing, looks like -find and -replace do not do anything when used with backup command.
But, the -test switch should work either way, IMHO.
find/replace seems to work right
zpaqfranz backup z:\prova *.cpp -find ".cpp" -replace ".pip"
then
C:\zpaqfranz>zpaqfranz l z:\prova_????????
zpaqfranz v60.1b-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-06-25)
franz:-hw
z:/prova_????????.zpaq:
1 versions, 25 files, 1.967.036 bytes (1.88 MB)
Date Time Size Ratio Name
---------- -------- ---------- ----- -----
2024-06-25 20:12:48 3.643.997 10% + 01.pip
2024-06-21 19:17:45 3.605.270 13% + 02.pip
2024-01-04 14:54:58 249 17% + 02_quest.pip
2024-06-15 11:47:45 3.617.549 11% + 08_1.pip
2024-06-21 18:59:06 3.629.794 10% + 59_9b.pip
2024-06-15 19:51:32 3.621.407 10% + a1.pip
2024-05-27 19:28:16 3.612.550 13% + andiamo.pip
2024-06-21 18:39:27 3.629.794 10% + datestare.pip
2024-05-27 18:11:49 3.612.549 13% + i_01.pip
If you want to change the stored path, you should use -to
-find and -replace are for restoring (or checking) files
C:\zpaqfranz>zpaqfranz backup z:\thebackup c:\zpaqfranz -to k:\newfolder
zpaqfranz v60.1b-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-06-25)
franz:-to <<k:/newfolder>>
franz:-hw
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
part0 z:/thebackup_00000000.zpaq i_filename z:/thebackup_????????.zpaq
Multipart backup seems OK
part0 z:/thebackup_00000000.zpaq i_filename z:/thebackup_????????.zpaq
Creating z:/thebackup_00000001.zpaq at offset 0 + 0
(...)
C:\zpaqfranz>zpaqfranz l z:\thebackup_????????
zpaqfranz v60.1b-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-06-25)
franz:-hw
z:/thebackup_????????.zpaq:
1 versions, 5.101 files, 1.662.998.916 bytes (1.55 GB)
Date Time Size Ratio Name
---------- -------- ------------ ----- -----
2024-06-25 21:08:48 0 dir # k:/newfolder/
2024-04-30 18:45:59 6 0% + k:/newfolder/._ciao.txt
2023-10-26 20:00:35 0 dir # k:/newfolder/.github/
2023-10-26 20:00:35 0 dir # k:/newfolder/.github/workflows/
2023-03-04 23:34:50 844 29% + k:/newfolder/.github/workflows/github_actions_build.yml
2023-03-04 23:34:50 427 33% + k:/newfolder/.gitignore
2023-03-04 23:34:50 12.618 29% + k:/newfolder/.travis.yml
2024-06-25 14:43:13 52 50% + k:/newfolder/0.bat
2023-10-26 20:00:35 0 dir # k:/newfolder/00000001/
2022-09-08 11:25:30 798 29% + k:/newfolder/00000001/1.txt
2022-08-25 16:50:14 29.581 29% + k:/newfolder/00000001/3.txt
2022-08-25 16:50:23 22.307 29% + k:/newfolder/00000001/4.txt
2022-08-12 16:33:28 13.050 29% + k:/newfolder/00000001/cpuz.txt
(...)
I would still consider this a bug, because it exits with an error code where it shouldn't. The backup is actually fine and all the tests return ok.
Three completely different things
Backup command (or add): -find/-replace (-to) change the source path to something else Aka the file c:\pippo\something will become d:\backup\something or whatever.
The testbackup command, without the -paranoid switch, operates not on the contents of the archives, but on the files themselves. So it checks for "holes" (i.e., missing archive files) In other words, that there has been no subsequent corruption, e.g. by replacing the file foo_000001.zpaq with something else With the -verify the -find/-replace (and -to) will change the stored paths to something else, aka something that exists on the filesystem WHERE THE BACKUP (aka the backup_00000001.zpaq, backup_0000000002.zpaq files) are. Almost never useful
The test command, with a parameter compare SHA-1 fragments (of the archived files) with the SHA-1 "pieces" (of the filesystem). In this case -find/-replace (and -to) is used to "translate" the stored paths with the filesystem paths WHERE THE SOURCE files are Turning d:\backup\something to z:\where_are_my_files_now\
=> you should use THREE different commands with different -find/-replace (-to) 1) zpaqfranz backup -find-replace (or -to) 2) zpaqfranz testbackup -find/-replace 3) zpaqfranz t (...) -find/-replace/-to
Then there is the parameterless t (test) command, which still does other things Then there is t (test) with -paranoid (which does other things) And then the v (verify) command and then w
=> The -test switch operates "in a chain," that is, it runs a test after finishing the backup. So all other parameters remain.
Normally, if paths are not "treacherously" changed, there is no need for any -find, -replace, -to
Normally they are used instead in the case of zfs snapshots, to unify the paths
The -paranoid switch (in testbackup) will check that the filenumber and filesize (inside the zpaq's index archive) is == the filenumber and filesize (from the chunked pieces) This require the password (if any) and test for wrong indexes file
Example You make a backup named pippo You make another backup named pluto Then you substitute the pippo's index file with pluto index
When archive files are separated they can be "shuffled," corrupted, deleted etc. This generates a whole series of problems
When the file (as by default) is one, this does not happen So, in general, the multipart format is preferred if you know what you are doing; you want to use a remote storage system (e.g., copying with rsync); and you want a quick way to check the integrity of the upload, done day by day (i.e., on the last part, which is normally much smaller than the total)
Translation Suppose you want to backup a 100GB fileserver, and send it with rsync remotely, for protection from ransomware etc. During the first run you will generate a local file say 100GB in size Suppose that, every day, you will add 1GB (this is just an example) of data, and that you will make a daily backup
With a monolithic file you will have a 100GB .zpaq today. You will send it with rsync in a couple of days remotely.
Tomorrow you will have a 101GB zpaq local file You will send it remotely with the rsync command but with the switch --append, effectively sending 1GB in a few minutes (BTW this is antiransomware, because --append and NO --delete)
After that (tomorrow) you will get a 102GB zpaq local file And with rsync --append you'll still send 1GB etc
The problem arises when you want to check that your local 102GB file is identical to the remote 102GB file You can do this in several ways One is rsync without --append. This (behind the scenes) will calculate the MD5 in local and remote chunks, and compare them. That's ~204GB, though, and load the remote system quite a bit.
If you use a multipart archive instead, you will have (in our example) a first 100GB file, then a 1GB file, and a 1GB file (one for day) You can check that the 1GB files are not corrupted (aka: local == remote), by comparing their MD5 hash codes much faster than in the single-archive situation ~2GB instead of ~204GB, with minimal load.
So you will do a loop like this meta-script prepare the local update => ship to remote calculate the remote MD5 code, get it, compare it with the local one (if you use the backup command it is already inside the index file) run a "heavy" local test
BTW there are last and last2 commands, in zpaqfranz, to do... exactly this duty
Basically for checking the REMOTE copy you will use a hash comparison (with the local one) While for LOCAL checking you will use much more computationally "heavy" "unpacking"
If the LOCAL stored-archive is correct (aka: unpackable), and the REMOTE's hash match the local one, then by the transitive property the REMOTE copy is correct
In short, these are mechanisms for handling verification of large-scale (hundreds of GB) remote copies, even on anemic systems (Atom, VPS etc)
The problem that needs to be addressed is the possibility that the remote copy was not properly uploaded (due to a problem during upload), or that it was modified later. Without, of course, being able to compare local and remote. In the case of local backups, i.e., on a LAN, zpaqfranz has the relevant r (robocopy) command and cp with the -append and -verify switches. They are used, typically, for ESXi servers => mount NFS or NAS.
Thank you for all the information. it was very helpful. Some of it I was aware of and some is new to me.
If I understand you correctly, you are suggesting that the error may be related to the fact that the archive that is generated is somehow corrupt. However, after running all available test commands you listed and using all available switches, the software always returns (all OK). Also, restoring the files and then comparing hashes shows that the archive was never corrupt, but rather the error code displayed when using -test switch is erroneous.
Basically, this error prevents me from using the -test switch in a script. Because what happens then is the script detects non 0 exit code and stops processing further commands, sends a failure notification to me which then forces me to spend time troubleshooting, but since the archive is actually fine, it ends up being a total waste of time.
zpaqfranz backup "/mnt/b2/print$" "/volume1/snapshots/print$" -to "/volume1/print$" -index "/volume1/NetBackup" -test
Fails with non 0 exit code
zpaqfranz backup "/mnt/b2/print$" "/volume1/snapshots/print$" -to "/volume1/print$" -index "/volume1/NetBackup"
(all OK)
zpaqfranz backup "/mnt/b2/print$" "/volume1/snapshots/print$" -index "/volume1/NetBackup" -test
(all OK)
If I understand correctly you want string manipulation in the test function as well. I can do that, although it slows down a bit
60_1g.zip You can try the attached pre-release NOTE: 60 use a different default archive format, testing is underway, not completed Let's suppose to do
zpaqfranz a z:\1.zpaq c:\ut -to k:\fake\ut
Then this is NOT good, of course
zpaqfranz t z:\1.zpaq c:\ut
BUT now you can use -to
zpaqfranz t z:\1.zpaq c:\ut -to k:\fake\ut
or "low level" find/replace
zpaqfranz t z:\1.zpaq c:\ut -find "k:/fake/ut" -replace "c:/ut" -verify
I think it is good to re-explain that the t command with one (or more) paths does NOT perform a decompression test of the archive
But a comparison of the tree from the filesystem with the files inside the archive If there is a PLUTO folder in the c:\nz path, even an empty one, and in the .zpaq archive there is not, the t command will fail. Similarly, if there is an empty PAPERINO folder in the zpaq archive, and c:\nz\PAPERINO does not exist, the test will fail
So the first step of the test is lexicographic, that is, the file names (from the filesystem) must be == to those in the archive In other words, there must be no files (in the filesystem) that are not in the archive and vice versa (less important)
The second step is to check the SHA-1 hash blocks between the files (in the filesystem) and those in the archive. In this step the .zpaq file is NOT expanded or processed in any way. There is NO guarantee that the files are actually "decompressable". Processing speed can be high, or low, depending on various circumstances (CPU speed and mass media)
The third (optional) step occurs if the -checksum (or -verify or -paranoid) switch is used. In this case, the hash (and CRC-32) of the files in the filesystem is calculated and compared with those stored in the archive. Again there is no guarantee that the files in the archive are "extractable," because this processing is simply NOT done. The data is NOT decompressed
The command that actually decompresses the data, simulating an extraction, is t WITHOUT one or more folders.
zpaqfranz t z:\1.zpaq
Now you know that you can decompress the data (or at least no errors are detected) You can add an additional layer, that is, of comparison with the contents of the filesystem, with the switch -verify
zpaqfranz t z:\1.zpaq -verify -find "k:/fake/ut" -replace "c:/ut" -ssd
In this scenario you first have the decompressibility check (and recalculated CRC-32s) and then the filesystem match check
If you want to have a higher level of security (!) you will have to write down a data extraction. However, this can reduce the life of the mass medium (*normally I use ramdisk or deduplicated zfs filesystems, if you want details) and takes longer
zpaqfranz t z:\1.zpaq -to z:\temporaneo -paranoid
If you do not want (or can) write, but you have plenty of RAM (aka: the largest decompressed file in the archive must be smaller than available RAM -10%) there is the w command
Finally there is the paranoid command (!)
zpaqfranz backup "/mnt/b2/print$" "/volume1/snapshots/print$" -index "/volume1/NetBackup" -test
Here you have a FILE ("/volume1/snapshots/print$") than the t with parameter (SHA1-chunked) will be called This is why I generally recommend using TWO separate commands, one to do the archiving, and the second for verification, with potentially different switches
The 60 version is not working on the Synology where I need to test it. Looks like it now requires libc.so and the versions do not match between my development machine and the Synology.
./zpaqfranz: /lib64/libc.so.6: version 'GLIBC_2.28' not found (required by ./zpaqfranz)
This is going to break portability for Linux users, unless you force GLIBC 2.2.5.
Basically, reliance on glibc makes it so a development machine and a production machines have to either be the same glibc version or a production machine has to be newer than a development machine.
??? There is the source code
??? There is the source code
Sorry, what do you mean ?
You can compile yourself the program
I did. I compiled on a RedHat machine and ran on a Synology machine and that was the error.
In the latest version you are using functions that rely on GLIBC library. My RedHat machine has GLIBC 2.28 which becomes the required version of zpaqfranz after it is compiled, however, the Synology machine has GLIBC 2.26 and that's what causes the error.
Sorry, I do not have "real" Synology I tried to find some, without luck, on Synology forum
It doesn't have to be Synology. It just has a Linux OS under the hood.
To test it, you need 2 Linux OS, one newer, one older. 2 versions of Ubuntu will do (or any Linux OS you prefer)
In a command prompt type in /lib64/libc.so.6
and that will give you the version of GLIBC you have on each OS.
Compile zpaqfranz on a newer version and then run it on the older one and you will get that error.
As for Synology, you can run one in a container for testing purposes.
What is the latest version you can run on synology?
Of zpaqfranz ?
59.9 was running fine.
Whatever changes you made in version 60 now require GLIBC library.
I struggle to understand what the problem is. On Synology (arm) I use a statically compiled version. There is no dynamic library (except for downloading from the internet, but no one cares) If you ldd you'll see exactly the same libraries
Our Synology is Intel.
I used the Make file you provided. I can try compiling statically if you could provide the command I need to run.
I ran LD_DEBUG=bindings ./zpaqfranz
and couldn't find anything odd.
I will post if I find anything linked to GLIBC version. Debugging this may take a while.
This is the very last zpaqfranz cross-compiled (on Debian 11) for arm, statically linked running on a physical Annapurna-powered (arm) QNAP NAS
Please try the attached (just a quick workaroud) test_ancient.zip
with
g++ -O3 -DANCIENT zpaqfranz.cpp -o zpaqfranz -static -pthread -s
This is an Intel Synology DSM 7.2 running a Debian-11 (static) compiled latest executable Compiled with nothing odd
g++ -O3 -Dunix zpaqfranz.cpp -pthread -lstdc++ -lm -o zpaqfranz_linux64 -t
Please le me know if this runs on your physical Synology
DSM 7.2 has GLIBC 2.36 which will work fine. I was using it on DSM 7.1 which has GLIBC 2.26.
After running objdump -T zpaqfranz | grep GLIBC
I found what function is causing the issue:
statx()
Version 59.9 doesn't have that function.
Edit:
From further tests that I ran it looks like that statx()
is not present in GLIBC lower than 2.28. This means version 60 won't run on older systems.
This command will give you all GLIBC versions required for an elf executable:
nm --dynamic --undefined-only --with-symbol-versions zpaqfranz | grep GLIBC_2
I will try to explain what happened here.
Basically GLIBC is part of the OS and each distribution version comes with a version of GLIBC which doesn't change.
See here: https://gist.github.com/richardlau/6a01d7829cc33ddab35269dacc127680
For example my RedHat 8 has GLIBC 2.28 and Synology DSM 7.1 has GLIBC 2.26
In order for you to get that error, you would have to compile on Debian 11, but run the file on Debian 9.
So GLIBC is not reverse compatible, meaning you cannot run an app that was compiled with GLIBC 2.28 on any distro where GLIBC < 2.28. Otherwise you get an error as I posed above. One note though, this rule only applies if an app uses GLIBC functions of newer version, otherwise, it would make no difference as with version 59.9 of zpaqfranz which mostly uses GLIBC 2.2.5 and 2.3.
So this issue will come up with any user who has a production system where GLIBC version is lower than the development system. I understand that a lot of users will run zpaqfranz on the same machine where it was compiled, but it is not the case for companies with IT departments or even Devs with a lot of different code.
In any case, the GLIBC issue is not a bug, but by design. I will try to see if I can find a workaround. I think I have code with examples how to force GLIBC to version 2.2.5 (universally compatible with all modern Linux systems, old and new)
Edit:
So I found the code I needed:
__asm__(".symver statx, statx@GLIBC_2.2.5");
However, as stated above, this won't work as statx()
was introduced in GLIBC 2.28.
This is an Intel Synology DSM 7.2 running a Debian-11 (static) compiled latest executable Compiled with nothing odd
g++ -O3 -Dunix zpaqfranz.cpp -pthread -lstdc++ -lm -o zpaqfranz_linux64 -t
Please le me know if this runs on your physical Synology
It doesn't work for the reasons explained above. I get the same error.
NOTE: 60 use a different default archive format, testing is underway, not completed
What does that mean ? Is it still zpaq archive format ?
Would you be able to update version 59.9 to resolve the -test switch ?
NOTE: 60 use a different default archive format, testing is underway, not completed
What does that mean ? Is it still zpaq archive format ?
It is newest zpaqfranz archive format zpaqfranz before 60 cannot test archive done with 60+, unless a older switch is used (for example -xxhash or -blake3)
Would you be able to update version 59.9 to resolve the -test switch ?
I would say no, but I can make 60 executable even for old systems I reiterate that switch -test I don't think does what you want it to do You should use the t command
Would you be able to update version 59.9 to resolve the -test switch ?
I would say no, but I can make 60 executable even for old systems I reiterate that switch -test I don't think does what you want it to do You should use the t command
Well, it doesn't even matter what command I use or if I understand how it works. I simply reported a bug.
A function should only throw an error if an error was detected. But, an error is thrown where there is no error.
It is up to you if you want to fix the reported issue or not.
Well, it doesn't even matter what command I use or if I understand how it works. I simply reported a bug.
A function should only throw an error if an error was detected. But, an error is thrown where there is no error.
It is up to you if you want to fix the reported issue or not.
I explained why altering path storage generates an error (before version 60). If you do not alter them, no error is reported.
Since the alteration is completely up to the user, it is up to you to figure out what you are doing and why. If you are troubled by this you can alternatively
Ok.
I appreciate the time you spent on this issue.
I will wait for version 60 that works on older systems and is production ready. I can't use alpha releases for anything other than testing.
60_1k.zip You can try this pre-release
g++ -O3 -Dunix -DNAS zpaqfranz.cpp -o zpaqfranz_nas -pthread -static -s
Seems to work. I did an extraction and all the files were extracted without error.
I did get an error when compiling that relates to update function, but that's very minor.
when compiling:
/tmp/ccl2VyEw.o: In function `downloadfile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) [clone .part.568]':
zpaqfranz.cpp:(.text+0x3a95e): warning: Using 'gethostbyname' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
when running zpaqfranz update
:
zpaqfranz v60.1k-NAS-L(2024-07-01)
Checking internet update (-verbose for details)
zpaqfranz: dl-call-libc-early-init.c:37: _dl_call_libc_early_init: Assertion `sym != NULL' failed.
Aborted (core dumped)
Backed up with -test switch then verified with t command using different switches. All commands returned (all OK).
The issue is resolved.
Let me know if you want me to test the update function later.
I think I found another bug related to backup command and -to switch.
Basically if I run this command multiple times, it duplicates the number of files in the index file:
./zpaqfranz backup "/volume1/NetBackup/test" "/volume1/NetBackup/test" -to "/volume1/print\$" -index "/volume1/NetBackup/Scripts"
Result:
188 +added, 0 -removed.
Then I ran the same command again and it shows that another 188 files were added with the same output, but the actual archive size seems to suggest that wasn't the case.
The first archive test_00000001.zpaq is 39.1 MB in size - that's correct. The second archive test_00000002.zpaq is 9.4 KB in size - looks right as no new files were actually added
Then I ran ./zpaqfranz l "/volume1/NetBackup/test_0000000?.zpaq"
Result:
/volume1/NetBackup/test_0000000?.zpaq:
2 versions, 376 files, 40.985.571 bytes (39.09 MB) <-- wrong file count
This doesn't happen when I omit the -to switch.
Ran the following command 2 times:
./zpaqfranz backup "/volume1/NetBackup/test" "/volume1/NetBackup/test" -index "/volume1/NetBackup/Scripts"
Then
./zpaqfranz l "/volume1/NetBackup/test_0000000?.zpaq"
Result:
/volume1/NetBackup/test_0000000?.zpaq:
2 versions, 188 files, 40.976.212 bytes (39.08 MB) <-- correct number of files
Do you have any .xls or .ppt file? PS it is test_???????? (eight ?)
Found. It is indeed a bug, albeit a modest one (the file gets bigger). You need to change this line to become like this (appending || command=='Z')
const bool i_renamed=command=='l' || command=='a' || command=='5' || command=='Z'; ///5 for dirsize arrggghh hidden parameter!
testbackup command (all OK) t -verify with -to switch (all OK) t with -to switch, but without -verify (all OK) list command has proper sizes and file lists
Everything is working in relation to this bug.
Much appreciated. It is now fully resolved.
I will wait for a prod release of version 60.
Hi again.
I've ran into another bug. When using backup command together with -find "string1" -replace "string2", the verification at the end fails due to the fact that the archive contains paths modified with -replace.
Example:
sudo zpaqfranz backup "/mnt/b2/print$" "/volume1/snapshots/print$" -find "/volume1/snapshots" -replace "/volume1" -verbose -index "/volume1/NetBackup" -test
In this case everything goes well, but then verification fails at the end.
00000193 +external (file missing in ZPAQ)