animetosho / ParPar

High performance PAR2 create client for NodeJS
190 stars 19 forks source link

--method display accuracy #58

Closed duhagppn closed 3 months ago

duhagppn commented 3 months ago

wondering as to the accuracy of the method displayed in the progress output and whether this has changed recently. I was using --method clmul-sve2 and recall that "Multiply method" displayed CLMUL, but now it appears to always say Shuffle? am I remembering wrong?

--method lookup-sse   Multiply method : Lookup (SSE2) with 35.02 KiB loop tiling, 12 threads
--method xor-sse      Multiply method : Xor (SSE2) with 35.25 KiB loop tiling, 12 threads
--method xorjit-sse   Multiply method : Xor-Jit (SSE2) with 58.5 KiB loop tiling, 12 threads
--method xorjit-avx2      Multiply method : Xor-Jit (AVX2) with 175.5 KiB loop tiling, 12 threads
--method xorjit-avx512  
--method shuffle-sse      Multiply method : Shuffle (SSSE3) with 15.94 KiB loop tiling, 12 threads
--method shuffle-avx      Multiply method : Shuffle (AVX) with 15.94 KiB loop tiling, 12 threads
--method shuffle-avx2     Multiply method : Shuffle (AVX2) with 8192 B loop tiling, 12 threads
--method shuffle-avx512   Multiply method : Shuffle (AVX512) with 8192 B loop tiling, 12 threads
--method shuffle-vbmi     Multiply method : Shuffle (AVX512) with 8192 B loop tiling, 12 threads
--method shuffle2x-avx2   Multiply method : Shuffle2x (AVX2) with 8160 B loop tiling, 12 threads
--method shuffle2x-avx512     Multiply method : Shuffle2x (AVX512) with 8192 B loop tiling, 12 threads
--method affine-sse   Multiply method : Affine (GFNI) with 8160 B loop tiling, 12 threads
--method affine-avx2      Multiply method : Affine (GFNI+AVX2) with 4096 B loop tiling, 12 threads
**--method affine-avx10**     Multiply method : Shuffle (AVX2) with 8192 B loop tiling, 12 threads
--method affine-avx512    Multiply method : Affine (GFNI+AVX512) with 4096 B loop tiling, 12 threads
--method affine2x-sse     Multiply method : Affine2x (GFNI) with 8160 B loop tiling, 12 threads
--method affine2x-avx2    Multiply method : Affine2x (GFNI+AVX2) with 4096 B loop tiling, 12 threads
**--method affine2x-avx10**   Multiply method : Shuffle (AVX2) with 8192 B loop tiling, 12 threads
--method affine2x-avx512      Multiply method : Affine2x (GFNI+AVX512) with 4096 B loop tiling, 12 threads
--method shuffle-neon     Multiply method : Shuffle (AVX2) with 8192 B loop tiling, 12 threads
--method clmul-neon   Multiply method : Shuffle (AVX2) with 8192 B loop tiling, 12 threads
--method shuffle128-sve   Multiply method : Shuffle (AVX2) with 8192 B loop tiling, 12 threads
--method shuffle128-sve2      Multiply method : Shuffle (AVX2) with 8192 B loop tiling, 12 threads
--method shuffle2x128-sve2    Multiply method : Shuffle (AVX2) with 8192 B loop tiling, 12 threads
--method shuffle512-sve2      Multiply method : Shuffle (AVX2) with 8192 B loop tiling, 12 threads
--method clmul-sha3   Multiply method : Shuffle (AVX2) with 8192 B loop tiling, 12 threads
--method clmul-sve2   Multiply method : Shuffle (AVX2) with 8192 B loop tiling, 12 threads
--method shuffle128-rvv   Multiply method : Shuffle (AVX2) with 8192 B loop tiling, 12 threads
--method clmul-rvv    Multiply method : Shuffle (AVX2) with 8192 B loop tiling, 12 threads

also curious as to affine-avx10 and affine2x-avx10 - both which display as Shuffle, and run on my CPU despite every other affine method resulting in illegal instruction.

animetosho commented 3 months ago

NEON/SVE methods only work on ARM platforms, so they won't be compiled in on x86. If you pick a method that hasn't been compiled for, it'll revert to the default, which appears to be "Shuffle (AVX2)" in your case.

My guess is that your compiler doesn't support AVX10, as it requires Clang 18 or GCC 14, so it'll do the same thing there and fall back to the default.
Maybe I can change the behaviour to error out if methods aren't compiled.

From your method list, the odd one out would be "Shuffle (VBMI)" is missing - can you confirm?

duhagppn commented 3 months ago

Shuffle VBMI is listed but doesn't run. avx512, vbmi, affine, affine2x all fail. the only thing I noticed was xorjit-avx512 doesn't display any multiply method (attempted or fallback), whereas everything else (even if failed) would show something.

perhaps --method-list which shows the compiled & usable methods which are available on the CPU?

Erroring out is the more logical conclusion whereas defaulting to an available method is more user friendly so it's up to you.

animetosho commented 3 months ago

I've decided to abort if the chosen method hasn't been compiled.

There's generally little reason to ever use the --method flag, as the best method should be auto-detected. I consider it to be an advanced option (i.e. the user knows what they're doing).

The required CPU feature(s) are listed in the method name. It looks like your CPU supports up to AVX2, so methods relying on newer features (AVX-512/AVX10, AVX512-VBMI, GFNI) will fail.

duhagppn commented 3 months ago

thanks, updated and it's much better now. The only reason I started to manually select method is because an old seedbox used to default to a slower one, trying each one has become a habit when trying out new seedboxes :nerd_face:

animetosho commented 3 months ago

Some hypervisors may be configured to mask the underlying CPU's CPUID. A key reason for the --method option was actually for this purpose.
Because the CPUID is masked, a --method-list flag, as you suggest, would actually be pointless because applications can't see all features the CPU supports.

Having said that, even with a masked CPUID, I can't imagine manual overrides to be that beneficial unless the underlying CPU supports GFNI.