Closed sitaramc closed 3 years ago
My general first recommendation with performance concerns is to try a more performance-oriented PAR2 client like MultiPar or ParPar and see how well it works for you.
In your case, a problem with including lots of small files is that PAR2 requires files to be block aligned. This means, if your block size is set to 750KB, for example, each 17 byte file gets effectively expanded to 750KB and processed that way. In other words, your 1744*17 = 28.95KB of data is treated as if it were actually 1.25GB in size (assuming 750KB block size).
Typical recommendation would be to use a smaller block size, or merge all files into one, as you've done via TAR.
If neither of those work for you, still, give the alternative clients a try. They likely have the same issues here as par2cmdline (they're subject to the same PAR2 limitations after all), but with a faster baseline speed, they may fall into acceptable territory for you.
On Sun, Aug 29, 2021 at 02:46:58AM -0700, Anime Tosho wrote:
My general first recommendation with performance concerns is to try a more performance-oriented PAR2 client like MultiPar or ParPar and see how well it works for you.
In your case, a problem with including lots of small files is that PAR2 requires files to be block aligned. This means, if your block size is set to 750KB, for example, each 17 byte file gets effectively expanded to 750KB and processed that way. In other words, your 1744*17 = 28.95KB of data is treated as if it were actually 1.25GB in size (assuming 750KB block size).
Typical recommendation would be to use a smaller block size, or merge all files into one, as you've done via TAR.If neither of those work for you, still, give the alternative clients a try. They likely have the same issues here as par2cmdline (they're subject to the same PAR2 limitations after all), but with a faster baseline speed, they may fall into acceptable territory for you.
I understand the logic/constraint better now; thanks!
For various reasons this needs to be installable on Linux without having to build, so both those alternatives are no-go for me.
I'll work something out with par2. For my immediate need this kind of tar-ing up works so I'll just script it properly. For other future needs I'll keep in mind that files with a wide distribution of sizes, especially with many at the lower end, may need to be handled specially.
Thanks again!
sitaram
FYI, in addition to the blocksize, there is a lot of per-file overhead in Par2. So, for every 17-byte file, there is at least 192 bytes of overhead. So, even if you set the blocksize to 20 bytes, you'll still see more than a 10 times expansion in storage.
Putting everything into a single TAR file will help with that too.
On Mon, Aug 30, 2021 at 06:23:38PM +0000, Michael Nahas wrote:
FYI, in addition to the blocksize, there is a lot of per-file overhead in Par2. So, for every 17-byte file, there is at least 192 bytes of overhead. So, even if you set the blocksize to 20 bytes, you'll still see more than a 10 times expansion in storage.
Putting everything into a single TAR file will help with that too.
thank you! It's a pretty easy workaround, so I'm fine with doing that long term.
@mdnahas I see you are looking into improvements for PAR3, maybe as part of that spec there is a way to stream small files into larger chunks for better efficiently.
Hi
I was trying to run par2 on one of my "borg" (a backup tool) repositories. The repo is 3646 MB. For whatever reason, it has 1762 files, of which 1744 are exactly 17 bytes each!
Par2's performance in the presence of so many small files is quite sub-optimal. Here're some number of time taken and space used (total size of all par2 files):
If I run par2 as-is on the repository (default redundancy 5%)
If I tar up the repo and run par2 on the single tar file:
Similarly, if I run it only on files not equal to 17 bytes
Are there any tips for dealing with this and making par2 have the performance characteristics of the second or third examples above, but more directly?