NVlabs / NVBit

200 stars 18 forks source link

Issues with memory divergence example #24

Closed trinayan closed 3 years ago

trinayan commented 3 years ago

Hi,

I was trying to implement the code for the memory divergence example shown in the paper in Listing 8. I encounter two issues.

  1. First "match_any_sync" function which is used here "int cnt = popc(match_any_sync(mask, cache_addr))" doesn't seem to be a valid function in the nvbit library. I am not sure how to resolve this and what to use in its place instead.

  2. Second "line 29" in the example "atomicAdd(&uniq_lines, 1.0f / cnt);". I feel like it should be atomicAdd(&uniq_lines, cnt) instead based on my understanding of memory divergence. Not sure if I am correct.

Thanks for the help.

ovilla commented 3 years ago
  1. that function is part of the CUDA programming APIs, https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-match-functions now sue why you can't resolve it.

  2. line 25 is returns for each thread how many threads in the warp are accessing the same cache line. So let's say a warp with 32 active threads, accesses 1 cache line with the first 20 threads and another cache line with the other 12 threads, then after instruction at line 25 you have 20 threads which variable cnt=20 and 12 threads which variable cnt=12. If you add cnt to the uniq_lines without scaling (1.0f/cnt) you get [ 20 threads 20 cnt + 12 threads 12 cnt ] = 544 , which does not tell you much. But if you add [20 threads 1/20 cnt + 12 threads 1/12 cnt ] = 2, which is the number of lines accessed by the warp.

trinayan commented 3 years ago

Thanks for the comment. It seems it was because I was using the default makefiles where it was compiling for SM_35 on which this function is not supported. The divergence calculation makes perfect sense now. So thanks for the detailed explanation.

Diksha-Moolchandani commented 2 years ago

match_any_sync is supported for >cc7.0

Here are the compute capabilities for different GPUs.

For a device with sm < sm_70, we need an alternative for match_any_sync()

Diksha-Moolchandani commented 2 years ago

Hi, did you solve the issue? I am not able to get the desired output. I suppose I am not passing the correct arguments to the instrumentation function. Can you please let me know what arguments did you pass?