fcorbelli / zpaqfranz

Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix
MIT License
271 stars 23 forks source link

Linear writing of the extracted files. #135

Open mirogeorg opened 2 days ago

mirogeorg commented 2 days ago

Franco, when extracting with ZPAQFRANZ, it reads the .ZPAQ archive sequentially, but writes and updates the extracted files non-sequentially. This results in a lot of read and write operations, making the disk the bottleneck.

Is it possible to make it write the extracted files sequentially while reading the .ZPAQ archive non-sequentially? It's clear that in this case, the same data might be read and extracted multiple times. This is how VEEAM extract filles.

This problem refers to the extraction of large files, where it is particularly pronounced.

fcorbelli commented 2 days ago

The short answer is OF COURSE NOT for zpaq Due to the multiple versione you can't (spiegone already on encode.ru)

With zpaqfranz YES if you have enough free RAM (aka: bigger than the biggest file to extract) or YES if you almost turn off the deduplicator (the -stdout switch)

Do you really want the spiegones? 😄

fcorbelli commented 2 days ago

PS no, the same data is not readed more then once, because there aren't same data in zpaq archives LSS zpaq's extraction is more SSD friendly, even zfs friendly for skipping over zeros Because even zero matters

mirogeorg commented 1 day ago

I understand. What is LSS extraction?

fcorbelli commented 1 day ago

Long Story Short You can use -ramdisk (on zpaqfranz) if you have enough RAM Just now I am refining the output of this switch

fcorbelli commented 1 day ago

Please try 60.8a (via an upgrade) with the -ramdisk switch If you have enough RAM the output will be sequential.
Incidentally, it also operates a check on file hashes, making additional tests unnecessary

It is a switch that already existed but showed little detailed information.
Needs to be refined, especially moving the warning for the availability of “true” RAM. Let me know what you think (obviously it is slower than a solid-state drive extraction, compared to normal extraction)

fcorbelli commented 1 day ago

PS this is just a pre-release under development...

mirogeorg commented 1 day ago

Franco, what was the -ramdisk option originally designed for? In my case, it won't be suitable because the VM disks are huge.

In general, it's hard to determine before the actual extraction whether the memory will be enough, and that's why I'm curious about what this option was originally designed for.

fcorbelli commented 1 day ago

Here's the translation of the text into English:

"It's used for chunk extraction on a RAMdisk. The w command. I never finished it for the 'normal' x command (extraction), maybe I'll work on it a bit. If you have a machine with a large enough amount of RAM and not huge files, you can perform an extraction with sequential writing, directly checking the hashes. This is something that's not possible to do, except with the p command, which is a merge of unzpaq206. However, it’s single-threaded and therefore incredibly slow. If the file size is larger than the available RAM, there’s nothing you can do. Mount solid-state drives or wait for it to finish.

Here's the translation into English:

"Obviously, by RAMdisk I simply mean RAM allocated in the computer, so it also technically includes the swapfile. I had to make a considerable effort to understand what is 'real' RAM (as opposed to virtual memory). Translation: if you want to compare the HASH (not the CRC-32) of the files inside ZPAQ, you have no choice but to extract them. After extracting them, you can calculate the HASH and compare it to the stored one. If you don't want to extract them (and you can do it), you use the 'ramdisk'. The files are extracted into RAM, and from there, you calculate the HASH (and maybe in the future, you write them to disk sequentially)."