Closed lsl036 closed 2 years ago
Haha, that's an interesting one. You are right of course. First of all, log(32)=5. But it does not really matter, because the conflict_free_offset always returns zero anyways.
I relied on the code from here and here. Both seem to be using the bank conflict avoidance equation for prefix sums from GPUGems 3.
I simply used it and never cared about what it does exactly since the code was working as is. But now that I dig into it, it does not really work at all the way I use it with bank widths of 32...and it also is not required for this project. With 32 bit integers, n >> 32 will always be zero. And without adding brackets around the first and second half of the addition,
I quickly debugged this on onlinegdb and it seems to be completely useless. Here, I can't even compile it without using 64 bit integer values as input.
So, no, it does not work. Nsight Compute also tells me that there are some bank conflicts. Since this kernel is already quite fast, however, I think it is not worth investigating time to fix them. I replaced the macro by a constant 0 to make sure that no compilation errors pop up.
I have noticed that in include/GPU/scan_largearray_kernel.cuh , it defines like:
#define NUM_BANKS 32U
#define LOG_NUM_BANKS 6U
, where I think the LOG should be5U
.In addition, the definition of
CONFLICT_FREE_OFFSET (n)
seems always equal to zero because the "+" has higher precedence than ">>".I check here but have no idea about this bank conflict-free operation.
I try to adjust this definition to: Then it seems that illegal memory access is generated somewhere. Any help for my understanding of this part of code? Does
CONFLICT_FREE_OFFSET
truly work? I am looking forward to your reply!