NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.28k stars 829 forks source link

Missing header file #1361

Open alpha-baby opened 4 months ago

alpha-baby commented 4 months ago

https://github.com/NVIDIA/nccl/blob/178b6b759074597777ce13438efb0e0ba625e429/src/include/coll_net.h#L10

should add include ?

#include "comm.h" // should add include ?
#include "nccl.h"
#include "nccl_net.h"
kiskra-nvidia commented 4 months ago

This shouldn't be necessary as our Makefile includes the required path:

https://github.com/NVIDIA/nccl/blob/178b6b759074597777ce13438efb0e0ba625e429/src/Makefile#L115

What is the actual issue that you are seeing?

alpha-baby commented 4 months ago

@kiskra-nvidia image

I use vscode+clangd to develop nccl code. However, there is a problem with the vscode prompt in this place, and a header file needs to be added(#include "comm.h").

wenbilliams commented 4 months ago

@alpha-baby is this a problem with compilation or just intellisense? If compilation, can you compile with clang on the cli?

alpha-baby commented 4 months ago

There's no problem with the compilation; the error was flagged by clangd. Currently, there are no compilation errors because the coll_net.cc file includes comm.h when compiling src/transport/coll_net.cc.

We should include comm.h in src/include/coll_net.h instead. This is because src/include/coll_net.h references the struct ncclComm defined in comm.h.

alpha-baby commented 4 months ago

Although from the compiler's perspective, there are no issues with the code, since src/include/coll_net.h references a struct defined in comm.h, comm.h should be included directly in src/include/coll_net.h.

Do you think so? @wenbilliams @kiskra-nvidia

kiskra-nvidia commented 4 months ago

In principle agreed: it would be more robust if every header file included all its dependencies. Is src/include/coll_net.h the only one that does not?

alpha-baby commented 4 months ago

In principle agreed: it would be more robust if every header file included all its dependencies. Is src/include/coll_net.h the only one that does not?原则上同意:如果每个头文件都包含其所有依赖项,那么它会更加健壮。 src/include/coll_net.h 是唯一一个没有的吗?

No, i also found the same issue in the following files:

src/include/p2p.h
src/include/cpuset.h
src/include/register.h
src/include/trees.h