fcorbelli / zpaqfranz

Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix
MIT License
275 stars 25 forks source link

-verify commandline argument for command a (add) overlaps with -test -> both trigger a filesystem based hash check #118

Closed Lennart00 closed 3 months ago

Lennart00 commented 3 months ago

Hello, the -verify commandline argument seems to trigger a rereading and testing of the hashes against the files on the filesystem additionally to the collision detection mechanism.

From my point of view the expected behavior is to have a check like this that reads in the files again for a verification of the archive only for the -test commandline argument and not the -verify one. 


PS C:\Users\Lennart\Downloads\temp> zpaqfranzhw.exe -verbose -threads 1 -sha1 -hw  -verify -collision -m56 -fragment 6 a ".\test2.zpaq" zpaq*
zpaqfranz v60.4c-JIT-GUI-L,HW BLAKE3,SHA1,SFX64 v55.1,(2024-07-13)
franz:-threads                                  1
franz:-method                                  56
franz:-fragment                                 6
franz:-sha1 -collision -hw -verbose -verify
Integrity check type: SHA-1+CRC-32 + CRC-32 by fragments
Creating ./test2.zpaq at offset 0 + 0
Add 2024-07-16 19:08:42         9         32.662.456 (  31.15 MB) 1T (0 dirs): -m56
monothread compress
MAX_FRAGMENT 520.192 (507.94 KB)
9 +added, 0 -removed.
                    0 starting size
           32.662.456 data to be added
           31.800.401 after deduplication
            7.728.803 after compression
            7.728.803 total size
Total speed 709.80 KB/s
IO buffer 1.048.576
========================================================================================================================================================================================================================================
Do a verify()

./test2.zpaq:
1 vers, 9 files, 482 frags, 3 blks, 7.728.803 bytes (7.37 MB)

Verify hashes of one version vs filesystem (monothread)
Scan done, preparing report...
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
OK          SHA-1 : 00000009 of 00000009 (    31.15 MB hash check against file on disk)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total hashed bytes 32.662.456 @ 1.020.701.750 B/s
no file errors tracked
SHA-1 collision detection time 0 ms
Files  added +9
44.985 seconds (00:00:44) (all OK)
fcorbelli commented 3 months ago

-verify, just like the v (verify) command, does exactly a post recomputing of hashes, to catch changed files (if any) from the start of the archiving to date

Therefore I think it should remain as today, for consistency with the v (verify) command

BTW I do not suggest -threads 1 (will slow down A LOT with a marginal, if any, gain in compression) -sha1 (use -blake3b, -xxh3b or the default xxhash64b) -altering default fragment size (it's up to you) -using the placebo-always-compress m5. As you know up to m4 zpaq DOES NOT try to compress files that "think" are already compressed (mp4, jpg, etc).

Total speed 709.80 KB/s

Veeerrryyyy sloooooowww 😄

Lennart00 commented 3 months ago

Thank you for your recommendation. My main point to be honest kind of was to clarify the difference between -verify and -test, because at the moment the documentation regarding these two is a little bit fuzzy and overlapping.

From my point of view both -verify and -test cause similar or almost the same behavior and i would like these differences to be noted somewhere.

quote from wiki (https://github.com/fcorbelli/zpaqfranz/wiki/Command-a-add-(Add-or-append-files-to-archive)) regarding command a add:

-verify
Do an early check for SHA-1 collisions during add(), fail if detected (slows ~10%)

-test
Do a post-add test (doveryay, no proveryay).

Output from zpaqfranzhw.exe h all zpaqfranz v60.4c-JIT-GUI-L,HW BLAKE3,SHA1,SFX64 v55.1,(2024-07-13)

+ : -test         Do a post-add test (doveryay, no proveryay).
+ : -verify       Verify hashes against filesystem

FYI i am experimenting with various fragment sizes and -mXX values for my own use cases, this is the reason for the unrelated command line arguments. (the files used in the compression test above were actually the binary artifacts from zpaqfranz all combined together :D )

Thank you very much for your continued work on zpaqfranz.

fcorbelli commented 3 months ago

Essentially (almost) everywhere you see verify is a re-read from the filesystem, where there is test there is archive processing. I've already added a thousand switches, that's why I try (even failing) to re-use them It is not easy to make a help that explains profusely what happens in about 50 characters For example, the test command with or without path totally changes operation and sense

Lennart00 commented 3 months ago

This has cleared any confusion from my side. Im closing the issue.

fcorbelli commented 3 months ago

Sometimes I have done “explanations” on the different modes of verification and testing of zpaq It depends, essentially, on “what” you want to do, and “where” “Where” means whether the archive is kept away (i.e., on another server) from the data from which it is composed “What” can have different meanings

Lennart00 commented 3 months ago

I guess im also a little bit paranoid(i would not have come here for questions regarding file verifications otherwise i guess :D).

For me personally this That the files in the archive are the same as those in the source folder? is the most important check of all, when you are just starting to use zpaqfranz initially. Other damages might occur later, but the initial state is something i feel like i need to check.

I came to zpaq(franz) after reading and learning about btrfs's deduplication and compression mechanisms. One maybe unexpected use case for zpaqfranz is to store various differently configured and modded versions of the same video game into one archive for deduplication and the results are impressive.

fcorbelli commented 3 months ago

For me personally this That the files in the archive are the same as those in the source folder? is the most important check of all, when you are just starting to use zpaqfranz initially. Other damages might occur later, but the initial state is something i feel like i need to check.

Then you should go to t (test) WITH a path

fcorbelli commented 3 months ago

I came to zpaq(franz) after reading and learning about btrfs's deduplication and compression mechanisms. One maybe unexpected use case for zpaqfranz is to store various differently configured and modded versions of the same video game into one archive for deduplication and the results are impressive.

zpaq is really a masterpiece from dr. Mahoney. Sadly he does not support (AFAIK) anymore zpaq With not-so-big changes it could have been even better