NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.25k stars 821 forks source link

Understanding Barriers #808

Open sandeep06011991 opened 1 year ago

sandeep06011991 commented 1 year ago

I need to use barriers to create shared buffers. One process writes to a shared location and another process reads from it . I am using barrier to order writer and reader. I have a few questions in this setting .

  1. Why is the barrier operation causing deviceSynchronize() option.
  2. Why does barrier action show up as a ringreduce kernel.
AddyLaddy commented 1 year ago

Neither of those are NCCL collective operations or APIs, so I assume you're using a higher level library or framework? NCCL doesn't have a Barrier API, but I imagine some library or framework could implement it as an AllReduce.

I suggest you find a forum for the framework or library you are using for your questions.

sjeaugey commented 1 year ago

Also note, I would not use NCCL or any other library to synchronize for another communication channel. If you want to communicate through shared memory, you should synchronize (i.e. perform a barrier) using that same shared memory, making sure you respect the right memory ordering semantics.

NCCL, like MPI and others, only guarantee ordering for their own operations. They should not be used to synchronize between processes for other shared objects, be it memory, files, or other resources. Doing so may work on some platforms and cause random data corruption on others.