Yutaka-Sawada / MultiPar

Parchive tool
977 stars 43 forks source link

Creating individual pars per file faster #127

Closed IndustrialOne closed 4 months ago

IndustrialOne commented 5 months ago

Normally when I have 100 RAR part files, I make the PAR the regular way; taking into account all the RAR parts.

However, this backup I made does not require me to have every single RAR part. If I lose one file, I don't need to download the entire backup but whatever part it's contained in. So I decided I will not make a PAR for the entire chain but one PAR per RAR part with the following command:

par2j64.exe c /uo /ss358400 /rn150 /rf1 "C:\backup\secondarchive.part01.rar.par2" secondarchive.part01.rar par2j64.exe c /uo /ss358400 /rn150 /rf1 "C:\backup\secondarchive.part02.rar.par2" secondarchive.part02.rar etc...

What surprised me was how quickly this was done with far less RAM or CPU consumption. Processing 200 GB of RARs for one PAR set would normally take 10 hours and like 20 GB of RAM, but processing the same amount of RARs with this method took a little over an hour.

You should review why two methods which essentially produce the same result, dealing with the same amount of data take radically different time/resources to accomplish.

Yutaka-Sawada commented 5 months ago

You should review why two methods which essentially produce the same result, dealing with the same amount of data take radically different time/resources to accomplish.

I explain how they are different. Basically, Parchive restores lost data (damaged or missing files) from available original data (source files) and recovery data (PAR2 files). The recovery is done in a set of files, by aligning many blocks in these files. Now look example of your cases.

1) When you set 100 RAR files as source files, and create some PAR2 files for them. If you set 10% redundancy, you can restore 10% of lost blocks in any source files. You will be able to recover 10 missing source files, or you may recover 100 slightly damaged source files. Because PAR2's recovering process is done in blocks, which file including blocks isn't matter. When you create 10 PAR2 files for 100 RAR files with 10% redundancy, you will be able to restore any 10 missing RAR files. Though it's slow and reqires much memory, the recoverying capability is high.

2) When you set 1 RAR file as source file, and create one PAR2 file for it. If you set 10% redundancy, you can restore 10% of lost blocks in the source file. But, you cannot restore the entire file, when you miss the file. While you can recover small damage in each file, it cannot help single lost file. When you create 1 PAR2 file for 1 RAR file with 10% redundancy, you cannot restore a missing RAR file. While it's fast and reqires less memory, there is a risk of entire lost.

You can see this possibility on MultiPar's Create Window at item: "Number of files that can be fully reconstructed if missing (Min - Max)". Normally, it's good the be at least 1 file can be fully reconstructed. For example, if you set 10% redundancy, you may be good to set 10 source files. When you create 1 PAR2 file for 10 RAR files with 10% redundancy, you will be able to restore any 1 missing RAR file. This would be good balance of speed and recoverying capability. You may find your favorite balance by your setting redundancy and possible missing number of files.

IndustrialOne commented 5 months ago

I understand all that, I'm not worried about losing an entire RAR part, only a few usenet articles here and there. The RAR parts don't depend on each other as they are a backup of 5000+ small files.

I just wanted to make sure that this exponential increase in resources to essentially create the same amount of PAR2 data was valid and not a bug. If this is expected, so be it.