The implementation was a bit messy. This should also make it easier to override scalar indexing in a back-end without affecting vectorized indexing. It also moves bounds checking to the CPU, which should improve the case where the array and indices are on the GPU already.
The implementation was a bit messy. This should also make it easier to override scalar indexing in a back-end without affecting vectorized indexing. It also moves bounds checking to the CPU, which should improve the case where the array and indices are on the GPU already.
Fixes https://github.com/JuliaGPU/CUDA.jl/issues/1678