fcorbelli / zpaqfranz

Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix
MIT License
259 stars 22 forks source link

Machine-parseable output (`-terse` ?) and failure to restore a file #120

Open luckman212 opened 1 month ago

luckman212 commented 1 month ago

Related to #63 I am trying to adapt my script to extract a specific version of a file in my archive. I'm using macOS 14.5, zpaqfranz v60.5e-NOJIT-L(2024-07-20) from Homebrew on M1 Mac.

For example, I want to restore version 2178 of IMG-20240516101824054.png below:

/tmp $ zpaqfranz l backup.zpaq -find "20240516101824054.png" -all -terse
2024-05-16 14:18:23  0644             207.434   87% 2178|+  /Users/luke/Sync/Obsidian/Main/attach/zzzzzzzzzzzzzzzzzzzzzzzzzz/IMG-20240516101824054.png
deleted/inacessible                         0   del 2187|-  /Users/luke/Sync/Obsidian/Main/attach/zzzzzzzzzzzzzzzzzzzzzzzzzz/IMG-20240516101824054.png

What is the most efficient command to do this based on the output? I noticed the fields are space-delimited (not tab) which makes it harder to parse (what if file name contains spaces?) I believe -terse should use NUL byte or TAB as field delimiters.

Also, if I don't specify -all then I get zero output. Is that correct? (Yes the original file was deleted). Is there any way to find or list only recoverable files? This is why I am trying to parse the output of -all by the way, since I need to filter out anything that says deleted/inacessible)

luckman212 commented 1 month ago

By the way this is the crazy insane pipeline I am using now to choose the file for restore (I will pass the version and filename to zpaqfranz x ...)

zpaqfranz l backup.zpaq -find "20240516101824054.png" -all -terse |
  grep -v ^deleted |
  cut -c53- |
  awk 'BEGIN { FS="|"; OFS="\t" } { print $1, substr($2,4) }' |
  fzf --exact --multi --no-select-1 --header "Ver$'\t'Filename"

😐

luckman212 commented 1 month ago

And yet even with all that hand-waving, I still can't figure out how extract the file I need...

FAIL 1

$ zpaqfranz x backup.zpaq /Users/luke/Sync/Obsidian/Main/attach/zzzzzzzzzzzzzzzzzzzzzzzzzz/IMG-20240516101824054.png -range 2178 -to /tmp/foo.png
zpaqfranz v60.5e-NOJIT-L(2024-07-20)
franz:-range                                 2178
franz:rangefrom (version)                   2.178
franz:rangeto   (version)                   2.178
franz:-to                   <</tmp/foo.png>>

backup.zpaq:
2718 versions, 86.547 files, 1.230.674.231 bytes (1.15 GB)
Extract 0 bytes (0.00  B) in 0 files (0 folders) / 8 T
Path does not exists   /tmp/foo.png
Getting free space for /tmp/

1.493 seconds (00:00:01) (all OK)

FAIL 2

$ zpaqfranz x backup.zpaq /Users/luke/Sync/Obsidian/Main/attach/zzzzzzzzzzzzzzzzzzzzzzzzzz/IMG-20240516101824054.png -range 2178 -to /tmp/
zpaqfranz v60.5e-NOJIT-L(2024-07-20)
franz:-range                                 2178
franz:rangefrom (version)                   2.178
franz:rangeto   (version)                   2.178
franz:-to                   <</tmp/>>
MAGIC: selected 1 file extracting to a folder => merge to /tmp/IMG-20240516101824054.png

backup.zpaq:
2718 versions, 86.547 files, 1.230.674.231 bytes (1.15 GB)
Extract 0 bytes (0.00  B) in 0 files (0 folders) / 8 T
Path does not exists   /tmp/IMG-20240516101824054.png
Getting free space for /tmp/

1.483 seconds (00:00:01) (all OK)

FAIL 3

$ zpaqfranz x backup.zpaq /Users/luke/Sync/Obsidian/Main/attach/zzzzzzzzzzzzzzzzzzzzzzzzzz/IMG-20240516101824054.png -range 2178 -to /tmp
zpaqfranz v60.5e-NOJIT-L(2024-07-20)
franz:-range                                 2178
franz:rangefrom (version)                   2.178
franz:rangeto   (version)                   2.178
franz:-to                   <</tmp>>
Cannot write on <<-to /tmp>>
519910: Aborting. Use -space to bypass and enforcing.
0.001 seconds (00:00:00) (with errors)

The .zpaq archive seems ok...

$ zpaqfranz t backup.zpaq
zpaqfranz v60.5e-NOJIT-L(2024-07-20)

backup.zpaq:
2718 versions, 86.547 files, 1.230.674.231 bytes (1.15 GB)
To be checked 1.352.276.056 in 22.853 files (8 threads)
7.15 stage time      33.19 no error detected (RAM ~128.52 MB), try CRC-32 (if any)
Checking            23.266 blocks with CRC-32 (1.352.276.056 not-0 bytes)
Block 00022K          1.22 GB
CRC-32 time           0.39s
Blocks       1.352.276.056 (      23.266)
Zeros                    0 (           0) 0.000000 s
Total        1.352.276.056 speed 3.494.253.374/s (3.25 GB/s)
GOOD            : 00022853 of 00022853 (stored=decompressed)
VERDICT         : OK                   (CRC-32 stored vs decompressed)
33.572 seconds (00:00:33) (all OK)
fcorbelli commented 1 month ago

I am a bit confused, there are numerous topics (already answered, but rewriting)

1) Extracting a single file to a SINGLE FILE

Please note: this is A FILE, not A FILE TO A FOLDER

You can use more than one way (usually on the "complexity" of the filename)

The "standard" way: use the fullname and -to A FILE (not a folder)

zpaqfranz x thearchivename THEFULLFILENAME -to THEFULLEXTRACTEDFILENAME -until THEVERSIONYOU WANT

Let's suppose you want to extract the file f:/zarc/inctrl/readme.txt of the version 659

zpaqfranz x copia_zarc.zpaq f:/zarc/inctrl/readme.txt -to z:/the_restored_file.txt -until 659
fcorbelli commented 1 month ago

2) A file TO A FOLDER

In this example in the z:\ugo folder. Please note the -only

zpaqfranz x copia_zarc.zpaq -only f:/zarc/inctrl/readme.txt -to z:\ugo -until 659

If the filename is unique you can use *filename to extract to A FOLDER (in this example the z:\allread)

zpaqfranz x copia_zarc.zpaq -only *readme_123.txt -to z:\allread -until 659
fcorbelli commented 1 month ago

Also, if I don't specify -all then I get zero output. Is that correct? Yes, it is If you list an archive, without anything (aka: zpaqfranz l thearchive.zpaq) you will get the current content

If you use "something" (-all, -range or whatever) then you will go to "show-everything-in-the-archive"

luckman212 commented 1 month ago

Ok, this worked:

zpaqfranz x backup.zpaq -only /Users/luke/Sync/Obsidian/Main/attach/zzzzzzzzzzzzzzzzzzzzzzzzzz/IMG-20240516101824054.png -to /private/tmp/zzz -until 2178

I feel very dumb when trying to figure out zpaqfranz syntax.

Thank you for the working command. πŸ™

fcorbelli commented 1 month ago

When running on *nix beware that a -space should be handy extracting to non-existent paths

TRANSLATION

when you extract something to /my/good/path zpaqfranz will try to figure if /my/good/path does exists, is writeable and there is enough freespace.

This is easy for Windows, virtually impossible for *nix. It can therefore happen that you get a resounding failure not because of some mistake, but because (for some reason, even of rights) zpaqfranz cannot figure out whether a certain path is β€œgood” In that case with -space you bypass everything: zpaqfranz tries to write, and good night

If you wonder why it is relative to the risk of filling a path, that is, running out of free space. This is a very frequent nightmare for people making batch copies. The execution ends, but the written file is incomplete, and therefore unserviceable

Short version: if in doubt, put -space

fcorbelli commented 1 month ago

The syntax of zpaqfranz is indeed strange, but it was introduced by its originator (Dr. Mahoney) and I have retained it for backward compatibility, with some mitigation

Remember that you can extract a PATH TO A PATH and a FILE TO A FILE, but you cannot extract a FILE TO A PATH Except by using the β€œtrick” of -only Remember that you can have multiple -only, and multiple -not as well. And that on nix machines it is good to use the β€œ if you use wildcards -only foo.jpg is no good, on Macs. -only β€œ*foo.jpg” on the other hand is

Last but not least (!) zpaq use -to, and zpaqfranz added -find and -replace to manipulate paths

fcorbelli commented 1 month ago

This should be OK (assuming only a single file with this name)

zpaqfranz x backup.zpaq -only "*IMG-20240516101824054.png" -to /private/tmp/zzz -until 2178 -space
fcorbelli commented 1 month ago

On -terse This is a fixed-width output Should (??) easy to parse The latest zpaqfranz's output is variable-sized (the size columns grow if needed)

Is there any way to find or list only recoverable files? Every file with size>0 is recoverable Yes, ... but... WHEN?

Suppose you have a $$$$$$$$$$$$$$$$$$.cpp file WHEN it was seen... the last time?

C:\zpaqfranz>zpaqfranz l z:\pippo.zpaq -all -only "*$$$$*"
zpaqfranz v60.6c-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2024-07-24)
franz:-all                                      4
franz:-only                                *$$$$*
----------------------------------------------------------------------------------------------------
franz:-hw

z:/pippo.zpaq:
4 versions, 19 files, 1.365.228 bytes (1.30 MB)

   Date      Time   Size Ratio  Ver Name/Info
---------- -------- ---- ----- ---- ----------
2024-07-24 12:52:31 3.697.163   13% 0001|+ $$$$$$$$$$$$$$$$$$.cpp
deleted/inacessible    0   del 0002|- $$$$$$$$$$$$$$$$$$.cpp
2024-07-24 19:03:24 3.695.705   13% 0003|+ $$$$$$$$$$$$$$$$$$.cpp
deleted/inacessible    0   del 0004|- $$$$$$$$$$$$$$$$$$.cpp

            7.392.868 (7.05 MB) of 7.392.868 (7.05 MB) in 4 files shown
            1.365.228 compressed  Ratio 0.185 <<z:/pippo.zpaq>>
0.015 seconds (00:00:00) (all OK)

Version 3 is the very last recoverable version of this file Of course you can get version 1 too (if you want)

fcorbelli commented 1 month ago

You want ALL the versions of this file This will create one folder for version (default padded to 4)

zpaqfranz x z:\pippo.zpaq -only "*$$$.cpp" -all -to z:\allz

This is padded to 8

zpaqfranz x z:\pippo.zpaq -only "*$$$.cpp" -all 8 -to z:\allz8
fcorbelli commented 1 month ago

BTW

zpaqfranz h x

1

fcorbelli commented 1 month ago

The -find is just about |grep on systems where grep does not exists double

luckman212 commented 1 month ago

Thank you for all of this. I didn't mean to accuse you of creating this arcane syntax. I understand it was inherited.

Back to what is actually my original question. About -terse

Yes I see it is fixed-width, hence I am "parsing" it using simple tools like cut and awk. Ok if that is the suggested method.

But, would you consider adding a flag to convert the space-delimited output to tabs instead? It would make the parsing more reliable in my opinion.

fcorbelli commented 1 month ago

But, would you consider adding a flag to convert the space-delimited output to tabs instead? It would make the parsing more reliable in my opinion.

Of course I can Do you want a CSV-delimited, or simply TABS between every columns? BTW you got me thinking about the lack of a switch (one of a thousand!) to enumerate the versions in which there is a file (without those in which it is deleted). Like -enumerate β€œ$$$$”

luckman212 commented 1 month ago

Yes TAB between each column would be ideal. I always prefer TAB because filenames can contain commas.

πŸ‘ of course ... -enumerate or -extractable would be a great addition.

fcorbelli commented 1 month ago

Yes TAB between each column would be ideal. I always prefer TAB because filenames can contain commas.

πŸ‘ of course ... -enumerate or -extractable would be a great addition.

The very best is | (cannot be in filename) but it is hard to parse Added -nodel (do not show deleted files), working on -tab...

luckman212 commented 1 month ago

The very best is | (cannot be in filename)

You sure about that? πŸ˜‰

image

c8e7cfee063457befbd0738cdfcd469296d8980e

fcorbelli commented 1 month ago

I do not use Mac πŸ˜„

OK, a programmable one...

luckman212 commented 1 month ago

Best is probably the NUL byte (\0) but TAB is a close second I think.

If you would like a Mac to test with I can ship you my old MacBook Air 2015 (only runs up to macOS 12.x Monterey but otherwise works fine) for free.

fcorbelli commented 1 month ago

Houston we have a problem with \t πŸ˜„ No big deal, require a bit of spaghetti code...

PS thank you, but I have a PowerPC Minimac (!!!!!!!) per hard-code tests

fcorbelli commented 1 month ago

60_6e.zip Please check the attached pre-release

zpaqfranz l z:\pippo.zpaq -terse -csv "\t"
zpaqfranz l z:\pippo.zpaq -terse -csv "|"
zpaqfranz l z:\pippo.zpaq -terse -csv ","
zpaqfranz l z:\pippo.zpaq -terse -csv "\",\""

Do you want a string AFTER the file name?

luckman212 commented 1 month ago

Thank you! Almost, but not quite (spaces should not be there, and delimiter should not be repeated, just 1 per field):

$ zpaqfranz l backup.zpaq -find "20240516101824054.png" -terse -csv "|"
2024-05-16 14:18:23 | 0644 |            207.434 |  87% |+  /Users/luke/Sync/Obsidian/Main/attach/IMG-20240516101824054.png

should be

2024-05-16 14:18:23|0644|207.434|87%|+|/Users/luke/Sync/Obsidian/Main/attach/IMG-20240516101824054.png

Do you want a string AFTER the file name?

No.

fcorbelli commented 1 month ago

mmmhhh... I will finish tomorrow 60_6f.zip

fcorbelli commented 1 month ago

60_6g.zip

luckman212 commented 1 month ago

60_6g looks pretty good! Only thing I see is an extra space before the file mode. Not sure if that's intentional

image

One other thing, the version and +/- column are printed joined together instead of as separate columns when using -csv:

$ zpaqfranz l backup.zpaq -find "20240516101824054.png" -all -nodel -terse -csv '|'
2024-05-16 14:18:23| 0644|207.434|87%|2178+|/Users/luke/Sync/Obsidian/Main/attach/zzzzzzzzzzzzzzzzzzzzzzzzzz/IMG-20240516101824054.png
2024-05-16 14:18:23| 0644|207.434|87%|2719+|/Users/luke/Sync/Obsidian/Main/attach/IMG-20240516101824054.png
$ ./zpaqfranz l backup.zpaq -find "20240516101824054.png" -all -nodel -terse
2024-05-16 14:18:23  0644             207.434   87% 2178|+ /Users/luke/Sync/Obsidian/Main/attach/zzzzzzzzzzzzzzzzzzzzzzzzzz/IMG-20240516101824054.png
2024-05-16 14:18:23  0644             207.434   87% 2719|+ /Users/luke/Sync/Obsidian/Main/attach/IMG-20240516101824054.png

Is that intentional?

fcorbelli commented 1 month ago

1) no, it is a different attr printout (instead of windows) 2) do not know, I'll dig now 1

fcorbelli commented 1 month ago

1

60_6g.zip

With -terse -all I cut off the version infos. Much easier parsing

luckman212 commented 1 month ago

60_6g working perfectly! A thing of beauty!

luckman212 commented 1 month ago

Update: still working well. I would consider this solved @fcorbelli

Thank you very much again for the wonderful tool.

Lennart00 commented 1 month ago

Just to chip in - this looks great adding more things to programmatically control and build on top of zpaqfranz :D