Parchive / par3cmdline

Official repo for par3cmdline and par3lib
GNU Lesser General Public License v2.1
92 stars 3 forks source link

status #3

Closed thezoggy closed 12 months ago

thezoggy commented 1 year ago

curious what the status of this repo

I know there was talks of par3 having backwards support or dropping it, never saw it in the spec if it wont or will not do par2. So figure easier if I ask is the plan to have par3cmdline be able to handle par2 or only par3.. or still TBD?

My hope is that it could be use a droppin replacement for par2cmdline and get the benefits of updated code to take advantage of modern hardware.. and then if/when par3 gets used.. we would be good to support it in sabnzbd.

Then on the side, has par2 vs par3 format/tools been benchmarked recently. I know there is a big push to make it handle things better (security/files/etc) but compared to par2 does it come at a cost of memory or performance, or vice versa.. its actually faster even in this non final/optimized state.

animetosho commented 1 year ago

PAR3 isn't backwards compatible with PAR2 as they are two rather different formats.
Of course, a client can choose to support both formats. I wouldn't count on par3cmdline supporting PAR2, as par2cmdline already does that, but maybe it could happen.

I'm not sure what you mean by "take advantage of modern hardware" - if it's using hardware capabilities for performance reasons, it's worth pointing out that par2cmdline isn't particularly aimed at such. par3cmdline is likely going to be a reference implementation initially, so I wouldn't count on performance being a primary goal, but it could eventually happen.

I don't think the reference PAR3 implementation is complete, so there's no actual benchmark possible. In theory, PAR3 should be faster than PAR2 (faster hash, support for sparse matrices etc, though you could also configure PAR3 to be much slower).
In terms of tradeoffs, better performance can be achieved by using a sparse matrix (compared to a dense matrix (which is only what PAR2 supports)), but you lose some recoverability (e.g. if 10 blocks are damaged, you may need >10 recovery blocks to repair).

thezoggy commented 1 year ago

From what I was looking at, it looked par2 was just a subset of par3. But yes of course I'm I'm referring if the cmdline tool would happen to support both. Just as many other tools do when they have another version/spec update, but im sure its all premature but didnt know if it was in the roadmap so to speak.

A question that comes up from time to time is what is the best/fastest par2 utility on linux. Very common when people are moving from windows->linux and then they notice speeds are slower than maybe they are used to. Generally people on linux only really have had par2cmdline-tbb which has intel speedup but no mt, or par2cmdline which didnt get mt until just a couple years ago and not much work done on it since. Hopefully someone that understands the code and has the cycles can pick up being the maintainer.

Which then brings us to par3cmdline as it is being worked on, and one can assume that it is going to benefit from cleaner code/have optimizations and several people looking at it.. I was curious to know if it might have par2 support which would benefit indirectly from all of this work and serve a useful way forward even if its not the original intent. Now if par3 gets final and people adopt it, then there is the added benefit of having both handled in the same utility as well.

animetosho commented 1 year ago

A question that comes up from time to time is what is the best/fastest par2 utility on linux

I generally suggest MultiPar and ParPar for performance. MultiPar is Windows only, but seems to work fine under Wine. ParPar only supports create, so wouldn't be useful for verify/repair tasks.

There's also gopar, but last time I checked, it wasn't particularly mature. That may have changed.

Generally people on linux only really have had par2cmdline-tbb which has intel speedup but no mt

It does have multithreading. From memory, par2cmdline-tbb also had some MMX optimisations for the GF16 computation (plus some experimental CUDA code, but I dunno how developed that became), so it gains further performance from that. Intel reduced MMX performance on Skylake, so the perf gain may not be as great on modern Intel processors.

I was curious to know if it might have par2 support which would benefit indirectly from all of this work and serve a useful way forward even if its not the original intent

I'd imagine if PAR2 gets supported, it'd mostly be a copy of par2cmdline code, so I wouldn't expect significant benefits.
I can't predict the future though, so can't rule out the possibility, but my personal opinion is not to expect much on that front.

thezoggy commented 1 year ago

I had a blip about the question that never made it as I ended up removing it because I wasn't trying stir up a conversation about comparing other par clients to par2cmdline.

So try to re-add that missing context, I was asking from pov with sabnzbd (we only care about reparing/renaming with it). While we use multipar on windows and par2-sl on mac, when it comes to linux we really only have par2cmdline (and its variants). Over the years and as more people use AMD or newer intel hardware where -tbb variant is less useful and it just makes sense to use the main par2cmdline.

People notice a performance hit going to linux with par and question if there is something better. That is what brought this recent inquiry up about par3cmdline and if there was any hope in it handling par2 with it.

animetosho commented 1 year ago

I don't think anyone should really be afraid to compare clients. They don't exist in a void, and it's logical for comparisons to be made when the choice is presented.

I recall par2SL was just MacOS compatibility fixes for par2cmdline, so you're essentially using par2cmdline for non-Windows platforms. You could give the option to use MultiPar on Linux, if the user has Wine installed. Otherwise, until someone decides to write a more performance-oriented cross-platform client, sticking with par2cmdline seems sensible.

animetosho commented 1 year ago

I decided to experiment with adding ParPar's backend into par2cmdline, which seems to give a decent speedup on supported platforms. Maybe it's sufficient for your needs.

thezoggy commented 1 year ago

thanks so much for the work, seeing people report par2cmdline-turbo being 2-3x faster than latest par2cmdline. working with some others to get larger target audience to get more testing/feedback

thezoggy commented 1 year ago

did a test tonight, same dataset and just standard options

win10: Intel Core i7-4790K CPU @ 4Ghz w/ssd and multipar 1.3.2.5: [4c4ce345b306416287fd8561839679e2] Repaired in 14 seconds

unraid 6.11.5: AMD Ryzen 7 3700X @ 3.6 GHz w/nvme and par2cmdline 0.8.0 [4c4ce345b306416287fd8561839679e2] Repaired in 12 seconds

unraid 6.11.5: AMD Ryzen 7 3700X @ 3.6 GHz w/nvme and par2cmdline-turbo 0.9.0 (built 2023-03-25) [4c4ce345b306416287fd8561839679e2] Repaired in 3 seconds

animetosho commented 1 year ago

Thanks for the benchmark.
I just thought I'd point out that the MultiPar value isn't really comparable to anything though - you can use the Windows binaries from here and here if you want that comparison point.

thezoggy commented 1 year ago

yep, i had run a few different tests and overall myself and a few others have found that par2cmdline-turbo is 2-3x faster than par2cmdline for previous existing platforms. then on arm native platforms, it performs even better (where optimized). overall its pretty impressive how fast it is :)