Open sandeep06011991 opened 1 year ago
Neither of those are NCCL collective operations or APIs, so I assume you're using a higher level library or framework? NCCL doesn't have a Barrier API, but I imagine some library or framework could implement it as an AllReduce.
I suggest you find a forum for the framework or library you are using for your questions.
Also note, I would not use NCCL or any other library to synchronize for another communication channel. If you want to communicate through shared memory, you should synchronize (i.e. perform a barrier) using that same shared memory, making sure you respect the right memory ordering semantics.
NCCL, like MPI and others, only guarantee ordering for their own operations. They should not be used to synchronize between processes for other shared objects, be it memory, files, or other resources. Doing so may work on some platforms and cause random data corruption on others.
I need to use barriers to create shared buffers. One process writes to a shared location and another process reads from it . I am using barrier to order writer and reader. I have a few questions in this setting .