Azure / msccl

Microsoft Collective Communication Library
MIT License
50 stars 6 forks source link

make failed on step4 4. Apply the msccl algo when using msccl external scheduler #29

Open SeekPoint opened 8 months ago

SeekPoint commented 8 months ago

amd00@MZ32-00:~/yk_repo/NCCL/msccl-az$ git remote -v origin https://github.com/Azure/msccl.git (fetch)

amd00@MZ32-00:~/yk_repo/NCCL/msccl-az/scheduler/msccl-scheduler$ CXX=/usr/local/cuda-11.7/bin/nvcc BIN_HOME=/home/amd00/yk_repo/NCCL/nccl/build/obj/ SRC_HOME=/home/amd00/yk_repo/NCCL/nccl/src make

Compiling & Linking libmsccl-scheduler.so.1.0.0 > /home/amd00/yk_repo/NCCL/msccl-az/scheduler/msccl-scheduler/build/lib/libmsccl-scheduler.so.1.0.0 Compiling & Linking src/scheduler.cc > src/parser.cc mkdir -p /home/amd00/yk_repo/NCCL/msccl-az/scheduler/msccl-scheduler/build/lib /usr/local/cuda-11.7/bin/nvcc -I/home/amd00/yk_repo/NCCL/nccl/build/obj//include -I/home/amd00/yk_repo/NCCL/nccl/src/src/include --compiler-options -fPIC,-shared,-DNCCL -o /home/amd00/yk_repo/NCCL/msccl-az/scheduler/msccl-scheduler/build/lib/libmsccl-scheduler.so.1.0.0 --linker-options -soname,libmsccl-scheduler.so.1 -lcurl src/scheduler.cc src/parser.cc In file included from src/scheduler.cc:21: src/parser.h:21:10: fatal error: msccl/msccl_scheduler.h: No such file or directory 21 | #include "msccl/msccl_scheduler.h" | ^~~~~~~~~ compilation terminated. make: *** [Makefile:43: /home/amd00/yk_repo/NCCL/msccl-az/scheduler/msccl-scheduler/build/lib/libmsccl-scheduler.so.1.0.0] Error 1 amd00@MZ32-00:~/yk_repo/NCCL/msccl-az/scheduler/msccl-scheduler$

Andyli1007 commented 2 months ago

suggest change the command as below CXX=/usr/local/cuda-11.7/bin/nvcc BIN_HOME=$HOME/amd00/yk_repo/NCCL/nccl/build/obj/ SRC_HOME=$HOME/amd00/yk_repo/NCCL/nccl/src make

Eevan-zq commented 1 month ago

Hello, I hava the same question, and this is my msccl-schduler path: root@docker-desktop:/home/msccl-tool/msccl/scheduler/msccl-scheduler# and my nccl path is: /home/nccl-tool/nccl/build/obj

I try those command: root@docker-desktop:/home/msccl-tool/msccl/scheduler/msccl-scheduler# CXX=/usr/local/cuda/bin/nvcc BIN_HOME=/home/nccl-tool/nccl/build/obj/ SRC_HOME=/home/nccl-tool/nccl/src/ make and "root@docker-desktop:/home/msccl-tool/msccl/scheduler/msccl-scheduler# CXX=/usr/local/cuda/bin/nvcc BIN_HOME=$HOME/nccl-tool/nccl/build/obj/ SRC_HOME=$HOM E/nccl-tool/nccl/src/ make".

but both them have the problems : Compiling & Linking libmsccl-scheduler.so.1.0.0 > /home/msccl-tool/msccl/scheduler/msccl-scheduler/build/lib/libmsccl-scheduler.so.1.0.0 Compiling & Linking src/scheduler.cc > src/parser.cc mkdir -p /home/msccl-tool/msccl/scheduler/msccl-scheduler/build/lib /usr/local/cuda-11.7/bin/nvcc -I/root/nccl-tool/nccl/build/obj//include -I/root/nccl-tool/nccl/src//src/include --compiler-options -fPIC,-shared,-DNCCL -o /home/msccl-tool/msccl/scheduler/msccl-scheduler/build/lib/libmsccl-scheduler.so.1.0.0 --linker-options -soname,libmsccl-scheduler.so.1 -lcurl src/scheduler.cc src/parser.cc In file included from src/scheduler.cc:21: src/parser.h:21:10: fatal error: msccl/msccl_scheduler.h: No such file or directory 21 | #include "msccl/msccl_scheduler.h" | ^~~~~~~~~ compilation terminated. make: *** [Makefile:43: /home/msccl-tool/msccl/scheduler/msccl-scheduler/build/lib/libmsccl-scheduler.so.1.0.0] Error 1.

So I executed the following command: root@docker-desktop:/home/msccl-tool# find -name msccl_scheduler.h shows ./msccl/executor/msccl-executor-nccl/src/include/msccl/msccl_scheduler.h

I am confused that there is no way to link /home/msccl-tool/msccl/executor/msccl-executor-nccl/src/include/msccl/msccl_scheduler.h file.