castano / icbc

A High Quality SIMD BC1 Encoder
MIT License
108 stars 7 forks source link

Optimize Neon implementation #5

Closed castano closed 2 years ago

castano commented 4 years ago

Currently the Neon backend loads the summed area table elements using the scalar code path. It would be interesting to optimize it using the VTL instruction to perform the table lookup. Note, VTL intrinsics are not supported under gcc 8.3, which is the version available on Raspberri Pi.

castano commented 2 years ago

Done.