Closed frival closed 6 years ago
@mpatwary , can you take a look at this one?
Thanks for bringing this to our attention. For my clarification, does RHEL install the openmpi lib and headers in different directories by default or this is happening to a specific version?
I just checked the RHEL 6, RHEL 7, and Fedora 27 openmpi and openmpi-devel packages and they do by default install libraries and headers in different directories. If someone wants to download, build, and install openmpi by hand I would imagine they would get libs and headers in the same directory tree, but that's not how it would happen by default in RHEL/Fedora. I'm assuming the same is the case for CentOS given that it's derived from RHEL packages.
Thanks @frival for the clarification. Could you please create a PR we could merge?
I've tweaked the suggestion above in the PR so it will default to MPI_PATH so folks who aren't burdened with this split shouldn't have to change anything in how they build. Would it make sense to also update the build instructions in README.md to refer to this variable if this approach is deemed acceptable?
I think mentioning that in the README.md is good idea, please go ahead. Thanks.
The DeepBench makefiles assume certain libs and header files (e.g. libmpi) are located in the same directory tree. For reasons I have yet to discern RHEL installs the libraries in one directory structure and the headers in another (in the -devel packages). By applying the below patch and adding "MPI_INCLUDE_PATH=/usr/include/openmpi-x86_64" DeepBench now builds cleanly on RHEL; I've coded the change such that it shouldn't require any changes for systems that have libraries and header files in the same directory.
diff -ur DeepBench.orig/code/baidu_allreduce/Makefile DeepBench/code/baidu_allreduce/Makefile --- DeepBench.orig/code/baidu_allreduce/Makefile 2017-12-14 14:53:03.255428367 -0500 +++ DeepBench/code/baidu_allreduce/Makefile 2017-12-12 12:15:19.396655494 -0500 @@ -6,6 +6,7 @@ CUDA_PATH?=/usr/local/cuda CUDA_LIB64=$(CUDA_PATH)/lib64 MPI_PATH?=/usr/local/openmpi +MPI_INCLUDE_PATH?=/usr/local/openmpi BAIDU_ALLREDUCE_PATH?=/local/baidu-allreduce BIN_DIR?=bin MKDIR=mkdir -p @@ -21,8 +22,8 @@
ring_all_reduce: $(MKDIR) $(BIN_DIR)
$(CUDA_PATH)/bin/$(NVCC) -c -std=c++11 -I $(MPI_INCLUDE_PATH) -I $(BAIDU_ALLREDUCE_PATH) -I $(CUDA_PATH)/include -DOMPI_SKIP_MPICXX= $(BAIDU_ALLREDUCE_PATH)/collectives.cu -o $(BIN_DIR)/collectives.o $(MPI_PATH)/bin/$(CC) -o $(BIN_DIR)/ring_all_reduce $(BIN_DIR)/ring_all_reduce_mpi.o $(BIN_DIR)/collectives.o -L$(CUDA_PATH)/lib64 -L$(MPI_PATH)/lib -lcudart -lmpi -DOMPI_SKIP_MPICXX=
clean: diff -ur DeepBench.orig/code/nvidia/Makefile DeepBench/code/nvidia/Makefile --- DeepBench.orig/code/nvidia/Makefile 2017-12-14 14:53:03.257428373 -0500 +++ DeepBench/code/nvidia/Makefile 2017-12-12 11:49:20.379360097 -0500 @@ -7,6 +7,7 @@ CUDNN_PATH?=/usr/local/cudnn NCCL_PATH?=/usr/local/nccl MPI_PATH?=/usr/local/openmpi +MPI_INCLUDE_PATH?=/usr/include/openmpi BIN_DIR?=bin MKDIR=mkdir -p
BLAS
@@ -45,7 +46,7 @@
nccl_mpi: $(MKDIR) $(BIN_DIR)
$(CUDA_PATH)/bin/$(NVCC) nccl_mpi_all_reduce.cu -o $(BIN_DIR)/nccl_mpi_all_reduce -I $(KERNELS_DIR) -I $(NCCL_PATH)/include/ -I $(CUDNN_PATH)/include/ -I $(MPI_INCLUDE_PATH) -L $(NCCL_PATH)/lib/ -L $(CUDNN_PATH)/lib64 -L $(MPI_PATH)/lib -lnccl -lcurand -lcudart -lmpi $(NVCC_ARCH_ARGS) -std=c++11
sparse: $(MKDIR) $(BIN_DIR) diff -ur DeepBench.orig/code/osu_allreduce/Makefile DeepBench/code/osu_allreduce/Makefile --- DeepBench.orig/code/osu_allreduce/Makefile 2017-12-14 14:53:03.258428376 -0500 +++ DeepBench/code/osu_allreduce/Makefile 2017-12-12 11:51:07.655654032 -0500 @@ -3,6 +3,7 @@ CC_FLAGS= -c -O2 -pthread -Wall -march=native
MPI_PATH?=/usr/local/openmpi +MPI_INCLUDE_PATH?=/usr/local/openmpi CUDA_PATH?=/usr/local/cuda MKDIR=mkdir -p BIN_DIR?=bin @@ -17,10 +18,10 @@
coll: $(MKDIR) $(BIN_DIR)
$(CC) -o $(BIN_DIR)/osu_coll.o $(CC_FLAGS) -I$(CUDA_PATH)/include -I$(MPI_INCLUDE_PATH) osu_coll.c
allreduce:
$(CC) -o $(BIN_DIR)/osu_allreduce.o $(CC_FLAGS) -I $(KERNELS_DIR) -I$(CUDA_PATH)/include -I$(MPI_INCLUDE_PATH) osu_allreduce.c
clean: rm -rf $(BIN_DIR)