Closed sergeevabc closed 3 months ago
Windows 7 x64, Zpaqfranz 59.4
$ zpaqfranz sum *.wav -pakka -sha256 No multithread: Found (2.85 MB) => 2.988.280 bytes (2.85 MB) / 2 files in 0.015000 60f2791266401076a68c3311d2fa089657cfb1116048a614cc4b841e63ffb187 original02.wav 7cb7ca1fadbe690d153e8b7e5598c6a43dc8ed6e8c68854458e3f70fe2172dbe original01.wav 0.265 seconds (000:00:00) (all OK)
a) Checksums are output only after all files have been processed, which is a problem when files are large. I would like to be able to see the results as soon as they are ready.
b) Results are displayed without taking into account alphabetical sorting.
thanks for the report
1) true 2) you can use the -nosort switch. By default the sort is made by hash, to quickly find duplicated files
|XXH3: 1CE69346A95DAB99DAD33D49FBFB6431 [ 130] |release/zpaqfranz_old/12/fai.bat
|XXH3: 1CE69346A95DAB99DAD33D49FBFB6431 [ 130] === |release/zpaqfranz_old/13/fai.bat
|XXH3: 1CE69346A95DAB99DAD33D49FBFB6431 [ 130] === |release/zpaqfranz_old/15/fai.bat
Please try the attached pre-release, with the -nosort switch
59_5a.zip This will show immediatly the computated hash (hopefully 😄 )
PS do not forget -ssd if you have solid state drives
b) Results are displayed without taking into account alphabetical sorting.
59_5b.zip Please try this one you can orderby and even -desc
zpaqfranz sum k:\vm -xxh3 -ssd -orderby size -only *.vmdk
@fcorbelli, thanks for the quick edits, but I think you're trying to make it more comfortable to pedal the bike backwards when people want to pedal forward.
By default the sort is made by hash, to quickly find duplicated files
Why is this the default behavior? Calculating a checksum and using a checksum to identify duplicates are two different tasks. The former task is regular, serving to detect accidental or intentional file changes. And the latter task is a derivative and less in demand. Not to mention, comparing several hashes by eye is a questionable practice, instead it seems more appropriate to give a hint from the app, but I'm not sure that a compression app should be extended in this way (there are jdupes-like apps for that).
For example, before I send photos to the remote facility, I calculate the checksums of the monthly archives then save them as checksums.sha256. This is what the output of checksum calculators looks like by default. And this is exactly the look I expected to get from Zpaqfranz without having to remember a thousand and one flags.
$ rhash --sha256 img202405*.jpg
833410eb6106a8865d21efbaa88250a7f20361b79d2a35ae541d4726a36c128e img202405_6894.jpg
b5045fff05433b10248d3b138bcd83a2e1322b4807710794ebc92ed45476a9f0 img202405_6895.jpg
5dd52b378bb927d83b3e0a4755a89ccd2e425bd076f60df3428fab96ad4a6300 img202405_6896.jpg
$ b3sum img202405*.jpg
96ac12a2716c7c25b00380017aebc56b78602e484a2ecb2b80c1961bcfdc0598 img202405_6894.jpg
c4b824e925f627d3afe34c3a04ed769b0ae73ab0ac08c8312f161c2eda38cbcb img202405_6895.jpg
3614a0d0afa46fb6eeb921a7eded0d602a433e898baf710c09740869ed870633 img202405_6896.jpg
$ xxhsum img202405*.jpg
46cb55b63ff71972 img202405_6894.jpg
24fa50966a7b913c img202405_6895.jpg
8380dc1a51b5972a img202405_6896.jpg
@fcorbelli, thanks for the quick edits, but I think you're trying to make it more comfortable to pedal the bike backwards when people want to pedal forward.
I'm actually more interested in how I pedal.
By default the sort is made by hash, to quickly find duplicated files Why is this the default behavior? Calculating a checksum and using a checksum to identify duplicates are two different tasks.
Because, as I explained, having a quick indication of duplicate files is what I do every day
The former task is regular, serving to detect accidental or intentional file changes. And the latter task is a derivative and less in demand.
It is just the opposite. zpaqfranz is my tool for making backups, not for calculating hashes. there are already many, maybe even better ones
Not to mention, comparing several hashes by eye is a questionable practice, instead it seems more appropriate to give a hint from the app,
You don't have to do it "by eye," you get three = appear. It is therefore immediate to identify them. At least for me
but I'm not sure that a compression app should be extended in this way (there are jdupes-like apps for that).
It's a program that I don't know In zpaqfranz there are d (duplication) and 1on1
For example, before I send photos to the remote facility, I calculate the checksums of the monthly archives then save them as checksums.sha256. This is what the output of checksum calculators looks like. And this is exactly the look I expected to get from Zpaqfranz without having to remember a thousand and one flags.
With zpaqfranz it is wasted time. The hashes are (can be) stored within the archive, along with their CRC-32 (global) and SHA-1 (block) You can store SHA256 like this
zpaqfranz a z:\1.zpaq *.cpp -sha256
That's all To read back
zpaqfranz l z:\1.zpaq -checksum
BTW you gave me an idea, I will make a parser that shows checksums even if specific algorithms are given (i.e. if you type l -sha256 it will show you the list, even if they are BLAKE3)
If you have a level of "paranoia" similar to mine, you can use hashdeep to create a list of hashes, add it to the archive, and then use zpaqfranz to compare the hashes of the extracted files I do this when storing zfs streams, just to have a check of external software (hashdeep), especially on name restorability
Short version: the purpose of the sum command is not to create a list of hashes, like md5 or hashdeep Of course you can do that. But here we are talking about (deduplicated) backups
|XXH3: 0000F21922075EC1E0BEBE3D781A4FDB [ 5.702] |c:/zpaqfranz/pakka/__astcache/@c@@zpaqfranz@pakka/c@@program files (x86)@embarcadero@studio@22.0@include@windows@sdk@cguid.h
|XXH3: 00027C61BBC801068CD2D7B5F82D8228 [ 6.393] |c:/zpaqfranz/pakka/__astcache/@c@@zpaqfranz@pakka/c@@program files (x86)@embarcadero@studio@22.0@include@windows@rtl@systemrtti.h
|XXH3: 002134602BBAEC53BCDB1D8936B3D3A7 [ 2.457.654] |c:/zpaqfranz/pakka/spaz/button-24822_1280.bmp
|XXH3: 003E1B7B293C53D820C81607A378079E [ 9.396] |c:/zpaqfranz/PDCursesMod-master/psffonts/mappings/CP862.TXT
|XXH3: 003FF08403A2138C14E97655634C316D [ 3.616] |c:/zpaqfranz/PDCursesMod-master/psffonts/fntcol16/bigsf-14.psf
|XXH3: 00486627A24AB2CD3849174B9E6B55D7 [ 812] |c:/zpaqfranz/PDCursesMod-master/demos/README.md
|XXH3: 00486627A24AB2CD3849174B9E6B55D7 [ 812] === |c:/zpaqfranz/demos/README.md
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] |c:/zpaqfranz/715/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/715/mono/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/715/ok/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/715/zpaq715/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/715d/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/717/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/717/ok/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/718/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/bsd/spaz/zpaq/zpaq/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/dataman/src/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/release/test/715/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/release/zpaqfranz_old/11/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/release/zpaqfranz_old/12/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/release/zpaqfranz_old/13/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/release/zpaqfranz_old/15/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/release/zpaqfranz_old/16/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/release/zpaqfranz_old/16beta/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/release/zpaqfranz_old/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/zpaqd/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/zpipe/libzpaq.h
|XXH3: 004F9A3DADBA3AD42EF34599462CFC92 [ 61.679] === |c:/zpaqfranz/zpipe/ok/libzpaq.h
I think this example is quite clear Incidentally, having an alphabetical order doesn't really add any particular information
You can also use a "smart" version (especially on *nix), namely the dir command (or call the zpaqfranz executable of, or a symbolic link to) In that case you can do
zpaqfranz dir -checksum -xxh3 zpaqfranz dir /os -checksum
I'll put an "autochecksum" at this point as well, to save a switch
Short version: I don't think it makes a lot of sense to have a program that, by default behavior, does exactly what a thousand other pieces of software do. I don't want to remember a thousand switches, pipes and concatenations to find duplicate files using other software I don't want to "pedal backwards," I want to "pedal to my goal"
BTW If you are wondering why the dir command exists, the answer is trivial for a storage manager Finds the largest files, recursively, within a folder and checks if they are duplicate
How do you do this? With a thousand switches and many pipes
With zpaqfranz?
zpaqfranz dir /s /os -blake3
Just like standard Windows: /s (recursive), /os (order by size)
And you'll get something like
11/05/2024 15:52 3.588.212 release/59_4/zpaqfranz.cpp
================= 3.588.212 va.cpp
02/09/2022 10:23 5.812.273 zpipe/ok/pippo.zpaq
================= 5.812.273 zpipe/ok2/pippo.zpaq
================= 5.812.273 zpipe/pippo.zpaq
02/09/2022 10:23 6.044.170 zpipe/ok/pippero.zpaq
================= 6.044.170 zpipe/ok2/pippero.zpaq
================= 6.044.170 zpipe/pippero.zpaq
24/07/2022 12:03 7.494.122 windows-via-c-c_5th-edition.pdf
================= 7.494.122 zpipe/decomp.pdf
================= 7.494.122 zpipe/ok/decomp.pdf
================= 7.494.122 zpipe/ok2/decomp.pdf
07/09/2023 11:00 19.012.145 zpaqlist/2.txt
================= 19.012.145 zpaqlist/uno/2.txt
Which, again, I think is really easy to interpret.
In the attached pre-release there is a brand new command, hash That mimic other softwares' behavyour
useful switches -stdout -ssd
not so useful -noeta -verbose
$ zpaqfranz.exe hash *.zip
zpaqfranz v59.5d-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-05-13)
franz:hash 9 - command < a lot of spaces
franz:-noconsole
HASHA < HASHA?
Hashing SHA-1 ignoring .zfs and :$DATA
No multithread: Found (40.54 MB) => 42.513.725 bytes (40.54 MB) / 7 files in 0.015000
6eb23ff770ea1d45788bbaad89f4d66f3af303cc sample01.zip
26b545b16ddb7514501bef110abbec9944fb57c8 sample02.zip
7a9871038cb8eb954b4f723c9f86b248f914fc59 sample03.zip
a027992dabb0df0eb20cf3ed08fe6371512d7bbc sample04.zip
8be6297f4132bc3a8936f5199bf9b48928f51d78 sample05.zip
cc873c4bf3875f5bf0884d6d7119b66a216e0f1b sample06.zip
5eb0b36798d4ddc487b7ba69e84addb4ba594fd8 sample07.zip217.270/SeC
2.512 seconds (000:00:02) (all OK) ^ lack of break looks odd, and what is SeC?
^ extra zero looks odd
$ zpaqfranz.exe hash *.zip
zpaqfranz v59.5e-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-05-13)
franz:hash 9 - command
franz:-noconsole
Hashing SHA-1 ignoring .zfs and :$DATA
6eb23ff770ea1d45788bbaad89f4d66f3af303cc sample01.zip
26b545b16ddb7514501bef110abbec9944fb57c8 sample02.zip
7a9871038cb8eb954b4f723c9f86b248f914fc59 sample03.zip
a027992dabb0df0eb20cf3ed08fe6371512d7bbc sample04.zip
8be6297f4132bc3a8936f5199bf9b48928f51d78 sample05.zip
cc873c4bf3875f5bf0884d6d7119b66a216e0f1b sample06.zip
5eb0b36798d4ddc487b7ba69e84addb4ba594fd8 sample07.zip
0.328 seconds (000:00:00) (all OK)
franz:hash 9 - command < a lot of spaces Just debug info
HASHA < HASHA? Just debug info
^ lack of break looks odd, and what is SeC? bytes for SeConds
^ extra zero looks odd I like it 😄 Sometimes making backups can last for veeeeery long time
^ extra zero looks odd I like it 😄
Well, this looks odd not because different representations of time are possible, but because it breaks the consistency of interface elements since you use two zeroes in other places (see below). It's like going outside, buttoning the cuff on one sleeve, but rolling it up to the elbow on the other.
$ zpaqfranz hash -sha256 Ennio.2021.mkv
does not work so far
$ zpaqfranz hash Ennio.2021.mkv
zpaqfranz v59.5e-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-05-13)
franz:hash 9 - command
franz:-noconsole
Hashing SHA-1 ignoring .zfs and :$DATA
029% 00:02:46 ( 872.00 MB) of ( 2.94 GB) 13.446.445/SeC
^ two zeroes ^ looks very odd, consider 13.44 MB/s which is clearer and more familiar
54760: CONTROL-C detected, try some housekeeping...
Well, this looks odd not because different representations of time are possible, ...
The total running time can be > 99 hours Hardly 999
but because it breaks the consistency of interface elements since you use two zeroes in other places (see below). It's like going outside, buttoning the cuff on one sleeve, but rolling it up to the elbow on the other.
The sleeves gets different lengths 😄
^ two zeroes ^ looks very odd, consider 13.44 MB/s which is clearer and more familiar
It is MUCH harder to find, in the source code, the exact line for ETA With different cases, MUCH easier Because there are about 4 different ETAs For small files, for big one, without an expected size Updated every 1 second etc
/SeC is myavanzamentoby1sec() So the expected operation is one update per second /sec is print_progress() That changes the writing as the ETA changes, so it is NOT the same as the previous one, although apparently it is
Because sometimes the output tells more then expected
Well yes, (almost) everything in zpaqfranz has a reason for being there
The newer pre-release 59.5g is ready to be updated
zpaqfranz update -force
With /s instead of /sec 😄 (much harder debug, but I'll invent something else)
BTW there is a new -home switch for the s command
C:\zpaqfranz>zpaqfranz s c:\users -home -ssd -ignore
zpaqfranz v59.5g-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-05-14)
franz:-home -hw -ignore -ssd
homesize
Scanning 5 subfolders...
Creating 5 scan threads
Parallel scan ended in 0.922000 s
----------------------------------------------------------------------------------------------------
2.461.494.221 00005697 c:/users/All Users/
0 00000000 c:/users/Default User/
1.568.227 00000103 c:/users/Default/
929.689.685 00008663 c:/users/Public/
35.141.251.003 00073727 c:/users/utente/
1.047 seconds (00:00:01) (all OK)
No more update, I close for now Thank you
Windows 7 x64, Zpaqfranz 59.4
a) Checksums are output only after all files have been processed, which is a problem when files are large. I would like to be able to see the results as soon as they are ready.
b) Results are displayed without taking into account alphabetical sorting.