BinomialLLC / basis_universal

Basis Universal GPU Texture Codec
Apache License 2.0
2.67k stars 260 forks source link

SSE4.1 support cannot be enabled on machines with AVX #342

Closed stefantalpalaru closed 1 year ago

stefantalpalaru commented 1 year ago

It is due to this check:

https://github.com/BinomialLLC/basis_universal/blob/8903f6d69849fd782b72a551a4dd04a264434e20/encoder/basisu_kernels_sse.cpp#L26-L27

Looks like some unfortunate refactoring artefact from old code that supported more than one SIMD type.

richgel999 commented 1 year ago

If AVX || AVX2 || AVX512F are defined here then the generated SSE code won't execute on very old (ancient!) CPU's that don't support AVX. That's why these macros are here. I'll double check that this still works, though.

richgel999 commented 1 year ago

I've verified that SSE supports works as intended under Windows and Linux. Can you describe what you're doing that's causing these code gen sanity checks to trigger? Your options are probably not the same as the ones used to compile the tool. I think it's fine to remove them if you're ok with the generated code not working on old CPU's. Perhaps I can put these checks in a guard.

stefantalpalaru commented 1 year ago

If __AVX__ || __AVX2__ || __AVX512F__ are defined here

Those are always defined by the compiler, on CPUs that support those features.

I've verified that SSE supports works as intended under Windows and Linux.

Does that CPU support AVX? Try replicating it on a VPS, with a more modern CPU.

Can you describe what you're doing that's causing these code gen sanity checks to trigger?

Trying to enable SSE4.1 by compiling the software on a CPU that supports AVX.

See also: https://bugs.gentoo.org/892727

Your options are probably not the same as the ones used to compile the tool.

cmake -DBUILD_X64=ON -DOPENCL=ON -DSSE=ON -DZSTD=ON

if you're ok with the generated code not working on old CPU's

No such risk. Runtime CPU feature check takes care of that.

The code I'm removing with the related PR, from 2 months ago, is a redundant and buggy compile-time check. No functionality will be lost.

richgel999 commented 1 year ago

Odd. I'm going to merge your PR - I don't see any harm in it. Thanks!