Closed DabiaoMa closed 6 years ago
namedBarrierSync is implemented in softmax.cuh (I should probably have picked a better place for it), as follows:
device forceinline void namedBarrierSync(int name, int numThreads) { asm volatile("bar.sync %0, %1;" : : "r"(name), "r"(numThreads) : "memory"); }
It allows us to use independent barriers for a subset of the threads in the block, rather than forcing all threads to synchronize with __syncthreads().
More information can be found at: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-bar
Thanks a lot
As a beginner of GPU programming, I am not quite familiar with the common used cuda methods. I have checked the cuda programming guide and did not find any documents about the method 'namedBarrierSync', would you please give me some hints about this method? Thanks.