Scaling to very large vectors?

djns99 / CUDA-Shuffle

MIT License

8 stars 1 forks source link

I think you could probably create a much faster version for your custom use case. Our shuffle assumes that we need to move the items around in memory and this actually is the most expensive step.

As an example if you know the positions of the (few) true values you could just run a bijective function on only those values and write them into a zero'd array. You can actually call the bijective function we provide for power of two length sizes and if the value falls outside of your array size you can feed it back in until you get something inside your range. This will not collide with any other value and is a valid bijection.

djns99 / CUDA-Shuffle

Scaling to very large vectors? #13