alecazam / kram

Encode/decode/info to KTX/KTX2/DDS files with LDR/HDR and BC/ASTC/ETC2. Mac/Win C++11 too, Mac viewer, and scripts for batch processing textures.
MIT License
153 stars 9 forks source link

Update SSE2NEON header #1

Closed jserv closed 3 years ago

jserv commented 3 years ago

At present, kram included the old copy of SSE2NEON header, which can be replaced with the latest one: https://github.com/DLTcollab/sse2neon The latest SSE2NEON already makes use of Aarch64 specific instructions.

alecazam commented 3 years ago

Ah great! I haven't tested the Linux/Win Neon path yet, and am already using Apple's SIMD on iOS/Mac. I'll make an update, so thanks for the tip!

alecazam commented 3 years ago

I had to comment out a few GCC push/pop pragmas, and there was a (-c) construct that I made (-(int32_t)c) to avoid a precision loss warning. But the latest is pushed. I also added fp16 <-> fp32 AVX ops in float4a to/fromFloat16, and didn't see those in sse2neon. I'm using _Float16 on mac/ios, but MSVS doesn't appear to support these.