Question about CONFLICT_FREE_OFFSET(n)

Haha, that's an interesting one. You are right of course. First of all, log(32)=5. But it does not really matter, because the conflict_free_offset always returns zero anyways.

I relied on the code from here and here. Both seem to be using the bank conflict avoidance equation for prefix sums from GPUGems 3.

I simply used it and never cared about what it does exactly since the code was working as is. But now that I dig into it, it does not really work at all the way I use it with bank widths of 32...and it also is not required for this project. With 32 bit integers, n >> 32 will always be zero. And without adding brackets around the first and second half of the addition,

I quickly debugged this on onlinegdb and it seems to be completely useless. Here, I can't even compile it without using 64 bit integer values as input.

So, no, it does not work. Nsight Compute also tells me that there are some bank conflicts. Since this kernel is already quite fast, however, I think it is not worth investigating time to fix them. I replaced the macro by a constant 0 to make sure that no compilation errors pop up.

GPUPeople / spECK

Question about CONFLICT_FREE_OFFSET(n) #9