Currently the Neon backend loads the summed area table elements using the scalar code path. It would be interesting to optimize it using the VTL instruction to perform the table lookup.
Note, VTL intrinsics are not supported under gcc 8.3, which is the version available on Raspberri Pi.
Currently the Neon backend loads the summed area table elements using the scalar code path. It would be interesting to optimize it using the VTL instruction to perform the table lookup. Note, VTL intrinsics are not supported under gcc 8.3, which is the version available on Raspberri Pi.