NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.27k stars 829 forks source link

[Question] Why are thread affinities not set for nccl proxy threads? #1426

Closed joerowell closed 2 months ago

joerowell commented 2 months ago

I see that in nccl 2.18.1-1 the following line was commented out:

https://github.com/NVIDIA/nccl/blob/178b6b759074597777ce13438efb0e0ba625e429/src/proxy.cc#L1400

Other than the fact that the argument to ncclProxyService changed, may I ask why this change was made?

sjeaugey commented 2 months ago

It looks like this was removed because we no longer had access to comm but we forgot to put it back. We'll fix that.

Here is a patch in the meantime:

fix_proxy_affinity.patch.txt

joerowell commented 2 months ago

Thank you!

sjeaugey commented 2 months ago

Actually, after reviewing the code again, it looks like the patch is not necessary. The affinity is set in the main thread during init and the service and progress threads should both be launched at that time, when the affinity is set. There should therefore be no need to set the affinity again inside the thread main function.

Did you see any difference with the patch applied?