GPUPeople / spECK

Efficient SpGEMM on GPU using CUDA and CSR
MIT License
50 stars 16 forks source link

Question about CONFLICT_FREE_OFFSET(n) #9

Closed lsl036 closed 2 years ago

lsl036 commented 2 years ago

I have noticed that in include/GPU/scan_largearray_kernel.cuh , it defines like: #define NUM_BANKS 32U #define LOG_NUM_BANKS 6U, where I think the LOG should be 5U.

In addition, the definition of CONFLICT_FREE_OFFSET (n) seems always equal to zero because the "+" has higher precedence than ">>". image

I check here but have no idea about this bank conflict-free operation.

I try to adjust this definition to: image Then it seems that illegal memory access is generated somewhere. Any help for my understanding of this part of code? Does CONFLICT_FREE_OFFSET truly work? I am looking forward to your reply!

dabeschte commented 2 years ago

Haha, that's an interesting one. You are right of course. First of all, log(32)=5. But it does not really matter, because the conflict_free_offset always returns zero anyways.

I relied on the code from here and here. Both seem to be using the bank conflict avoidance equation for prefix sums from GPUGems 3.

I simply used it and never cared about what it does exactly since the code was working as is. But now that I dig into it, it does not really work at all the way I use it with bank widths of 32...and it also is not required for this project. With 32 bit integers, n >> 32 will always be zero. And without adding brackets around the first and second half of the addition,

I quickly debugged this on onlinegdb and it seems to be completely useless. Here, I can't even compile it without using 64 bit integer values as input.

So, no, it does not work. Nsight Compute also tells me that there are some bank conflicts. Since this kernel is already quite fast, however, I think it is not worth investigating time to fix them. I replaced the macro by a constant 0 to make sure that no compilation errors pop up.