I am trying to launch the MNIST example on a single machine on AWS. The dependencies fail to install. Likely to do something with Rust. Am I missing dependencies?
Environment
Your operating system and version: Ubuntu 20.04.4 LTS
Your python version: 3.8.10
Your PyTorch version: not installed
How did you install python (e.g. apt or pyenv)? Did you use a virtualenv?: Preinstalled Ubuntu
Have you tried using latest bagua master (python3 -m pip install --pre bagua)?: No
Reproducing
Launch g4dn.xlarge instance with AMI: ami-00a2823259d140f46 (Deep Learning AMI GPU CUDA 11.3.1 (Ubuntu 20.04) 20220303, Built with NVIDIA CUDA, cuDNN, NCCL, GPU Driver, Docker, NVIDIA-Docker and EFA support)
check CUDA version: nvcc --version -> version 11.3
clone bagua repo: git clone https://github.com/BaguaSys/bagua.git && cd bagua/examples/mnist
install requirements: pip install -r requirements.txt -> error, see below.
Running `/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/target/release/build/bagua-core-internal-65d17bb237c18142/build-script-build`
The following warnings were emitted during compilation:
warning: nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
warning: nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
error: failed to run custom build command for `bagua-core-internal v0.1.2 (/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal)`
Caused by:
process didn't exit successfully: `/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/target/release/build/bagua-core-internal-65d17bb237c18142/build-script-build` (exit status: 101)
--- stdout
TARGET = Some("x86_64-unknown-linux-gnu")
OPT_LEVEL = Some("3")
HOST = Some("x86_64-unknown-linux-gnu")
CXX_x86_64-unknown-linux-gnu = None
CXX_x86_64_unknown_linux_gnu = None
HOST_CXX = None
CXX = None
NVCC_x86_64-unknown-linux-gnu = None
NVCC_x86_64_unknown_linux_gnu = None
HOST_NVCC = None
NVCC = None
CXXFLAGS_x86_64-unknown-linux-gnu = None
CXXFLAGS_x86_64_unknown_linux_gnu = None
HOST_CXXFLAGS = None
CXXFLAGS = None
CRATE_CC_NO_DEFAULTS = None
DEBUG = Some("false")
CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2")
running: "nvcc" "-ccbin=c++" "-Xcompiler" "-O3" "-Xcompiler" "-ffunction-sections" "-Xcompiler" "-fdata-sections" "-Xcompiler" "-fPIC" "-m64" "-I" "cpp/include" "-I" "third_party/cub-1.8.0" "-I" "/home/ubuntu/.local/share/bagua/nccl/include" "-Xcompiler" "-Wall" "-Xcompiler" "-Wextra" "-std=c++14" "-cudart=shared" "-gencode" "arch=compute_35,code=sm_35" "-gencode" "arch=compute_37,code=sm_37" "-gencode" "arch=compute_50,code=sm_50" "-gencode" "arch=compute_52,code=sm_52" "-gencode" "arch=compute_53,code=sm_53" "-gencode" "arch=compute_60,code=sm_60" "-gencode" "arch=compute_61,code=sm_61" "-gencode" "arch=compute_62,code=sm_62" "-gencode" "arch=compute_70,code=sm_70" "-gencode" "arch=compute_72,code=sm_72" "-gencode" "arch=compute_75,code=sm_75" "-gencode" "arch=compute_80,code=sm_80" "-gencode" "arch=compute_86,code=sm_86" "-o" "/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/target/release/build/bagua-core-internal-a571c95913d0ee58/out/kernels/bagua_kernels.o" "-c" "kernels/bagua_kernels.cu"
cargo:warning=nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
exit status: 0
AR_x86_64-unknown-linux-gnu = None
AR_x86_64_unknown_linux_gnu = None
HOST_AR = None
AR = None
running: "ar" "cq" "/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/target/release/build/bagua-core-internal-a571c95913d0ee58/out/libbagua_kernels.a" "/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/target/release/build/bagua-core-internal-a571c95913d0ee58/out/kernels/bagua_kernels.o"
exit status: 0
running: "nvcc" "-ccbin=c++" "-Xcompiler" "-O3" "-Xcompiler" "-ffunction-sections" "-Xcompiler" "-fdata-sections" "-Xcompiler" "-fPIC" "-m64" "-I" "cpp/include" "-I" "third_party/cub-1.8.0" "-I" "/home/ubuntu/.local/share/bagua/nccl/include" "-Xcompiler" "-Wall" "-Xcompiler" "-Wextra" "-std=c++14" "-cudart=shared" "-gencode" "arch=compute_35,code=sm_35" "-gencode" "arch=compute_37,code=sm_37" "-gencode" "arch=compute_50,code=sm_50" "-gencode" "arch=compute_52,code=sm_52" "-gencode" "arch=compute_53,code=sm_53" "-gencode" "arch=compute_60,code=sm_60" "-gencode" "arch=compute_61,code=sm_61" "-gencode" "arch=compute_62,code=sm_62" "-gencode" "arch=compute_70,code=sm_70" "-gencode" "arch=compute_72,code=sm_72" "-gencode" "arch=compute_75,code=sm_75" "-gencode" "arch=compute_80,code=sm_80" "-gencode" "arch=compute_86,code=sm_86" "--device-link" "-o" "/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/target/release/build/bagua-core-internal-a571c95913d0ee58/out/bagua_kernels_dlink.o" "/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/target/release/build/bagua-core-internal-a571c95913d0ee58/out/libbagua_kernels.a"
cargo:warning=nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
exit status: 0
running: "ar" "cq" "/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/target/release/build/bagua-core-internal-a571c95913d0ee58/out/libbagua_kernels.a" "/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/target/release/build/bagua-core-internal-a571c95913d0ee58/out/bagua_kernels_dlink.o"
exit status: 0
running: "ar" "s" "/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/target/release/build/bagua-core-internal-a571c95913d0ee58/out/libbagua_kernels.a"
exit status: 0
cargo:rustc-link-lib=static=bagua_kernels
cargo:rustc-link-search=native=/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/target/release/build/bagua-core-internal-a571c95913d0ee58/out
CXXSTDLIB_x86_64-unknown-linux-gnu = None
CXXSTDLIB_x86_64_unknown_linux_gnu = None
HOST_CXXSTDLIB = None
CXXSTDLIB = None
cargo:rustc-link-lib=stdc++
cargo:rustc-link-search=native=/usr/local/cuda/bin/../targets/x86_64-linux/lib
cargo:rustc-link-lib=cudart_static
CMAKE_TOOLCHAIN_FILE_x86_64-unknown-linux-gnu = None
CMAKE_TOOLCHAIN_FILE_x86_64_unknown_linux_gnu = None
HOST_CMAKE_TOOLCHAIN_FILE = None
CMAKE_TOOLCHAIN_FILE = None
CMAKE_GENERATOR_x86_64-unknown-linux-gnu = None
CMAKE_GENERATOR_x86_64_unknown_linux_gnu = None
HOST_CMAKE_GENERATOR = None
CMAKE_GENERATOR = None
CMAKE_PREFIX_PATH_x86_64-unknown-linux-gnu = None
CMAKE_PREFIX_PATH_x86_64_unknown_linux_gnu = None
HOST_CMAKE_PREFIX_PATH = None
CMAKE_PREFIX_PATH = None
CMAKE_x86_64-unknown-linux-gnu = None
CMAKE_x86_64_unknown_linux_gnu = None
HOST_CMAKE = None
CMAKE = None
running: "cmake" "/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/Aluminum" "-DCMAKE_CXX_STANDARD=17" "-DALUMINUM_ENABLE_NCCL=YES" "-DCUB_INCLUDE_PATH=/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/cub-1.8.0" "-DNCCL_LIBRARY=/home/ubuntu/.local/share/bagua/nccl/lib/libnccl.so" "-DNCCL_INCLUDE_PATH=/home/ubuntu/.local/share/bagua/nccl/include" "-DBUILD_SHARED_LIBS=off" "-DCMAKE_INSTALL_PREFIX=/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/../../../bagua_core/.data" "-DCMAKE_C_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_C_COMPILER=/usr/bin/cc" "-DCMAKE_CXX_FLAGS= -std=c++17 -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_COMPILER=/usr/bin/c++" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_ASM_COMPILER=/usr/bin/cc" "-DCMAKE_BUILD_TYPE=Release"
-- The CXX compiler identification is GNU 9.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- NCCL support requested but no GPU runtime enabled. Assuming CUDA support.
-- Performing Test CXX_COMPILER_HAS_FALIGNED_NEW
-- Performing Test CXX_COMPILER_HAS_FALIGNED_NEW - Success
-- Performing Test CXX_COMPILER_HAS_G3
-- Performing Test CXX_COMPILER_HAS_G3 - Success
-- Performing Test CXX_COMPILER_HAS_OG
-- Performing Test CXX_COMPILER_HAS_OG - Success
-- Found MPI_CXX: /opt/amazon/openmpi/lib/libmpi.so (found suitable version "3.1", minimum required is "3.0")
-- Found MPI: TRUE (found suitable version "3.1", minimum required is "3.0") found components: CXX
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Found HWLOC: /usr/lib/x86_64-linux-gnu/libhwloc.so
-- Found CUDA: /usr/local/cuda (found suitable version "11.3", minimum required is "9.0")
-- The CUDA compiler identification is NVIDIA 11.3.109
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found NCCL: /home/ubuntu/.local/share/bagua/nccl/lib/libnccl.so (found suitable version "2.9.9", minimum required is "2.7.0")
-- Found CUB: /tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/cub-1.8.0
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/bagua_core/.data/build
running: "cmake" "--build" "." "--target" "install" "--config" "Release" "--parallel" "4"
[ 7%] Building CXX object src/CMakeFiles/Al.dir/Al.cpp.o
[ 15%] Building CXX object src/CMakeFiles/Al.dir/mempool.cpp.o
[ 23%] Building CXX object src/CMakeFiles/Al.dir/mpi_impl.cpp.o
[ 30%] Building CXX object src/CMakeFiles/Al.dir/profiling.cpp.o
[ 38%] Building CXX object src/CMakeFiles/Al.dir/progress.cpp.o
--- stderr
CMake Warning (dev) in src/CMakeLists.txt:
Policy CMP0104 is not set: CMAKE_CUDA_ARCHITECTURES now detected for NVCC,
empty CUDA_ARCHITECTURES not allowed. Run "cmake --help-policy CMP0104"
for policy details. Use the cmake_policy command to set the policy and
suppress this warning.
CUDA_ARCHITECTURES is empty for target "Al".
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning:
Manually-specified variables were not used by the project:
CMAKE_ASM_COMPILER
CMAKE_ASM_FLAGS
make: warning: -j4 forced in submake: resetting jobserver mode.
In file included from /tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/Aluminum/include/Al.hpp:1221,
from /tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/Aluminum/src/mpi_impl.cpp:28:
/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/Aluminum/include/aluminum/nccl_impl.hpp: In function ‘ncclRedOp_t Al::internal::nccl::ReductionOperator2ncclRedOp(Al::ReductionOperator)’:
/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/Aluminum/include/aluminum/nccl_impl.hpp:143:12: error: ‘ncclAvg’ was not declared in this scope; did you mean ‘nccl’?
143 | return ncclAvg;
| ^~~~~~~
| nccl
In file included from /tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/Aluminum/include/Al.hpp:1221,
from /tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/Aluminum/src/Al.cpp:35:
/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/Aluminum/include/aluminum/nccl_impl.hpp: In function ‘ncclRedOp_t Al::internal::nccl::ReductionOperator2ncclRedOp(Al::ReductionOperator)’:
/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/Aluminum/include/aluminum/nccl_impl.hpp:143:12: error: ‘ncclAvg’ was not declared in this scope; did you mean ‘nccl’?
143 | return ncclAvg;
| ^~~~~~~
| nccl
make[2]: *** [src/CMakeFiles/Al.dir/build.make:104: src/CMakeFiles/Al.dir/mpi_impl.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [src/CMakeFiles/Al.dir/build.make:76: src/CMakeFiles/Al.dir/Al.cpp.o] Error 1
In file included from /tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/Aluminum/include/Al.hpp:1221,
from /tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/Aluminum/src/progress.cpp:31:
/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/Aluminum/include/aluminum/nccl_impl.hpp: In function ‘ncclRedOp_t Al::internal::nccl::ReductionOperator2ncclRedOp(Al::ReductionOperator)’:
/tmp/pip-install-z14bsv9q/bagua_b1ea10d6927a48eab006199d1dfaa765/rust/bagua-core/bagua-core-internal/third_party/Aluminum/include/aluminum/nccl_impl.hpp:143:12: error: ‘ncclAvg’ was not declared in this scope; did you mean ‘nccl’?
143 | return ncclAvg;
| ^~~~~~~
| nccl
make[2]: *** [src/CMakeFiles/Al.dir/build.make:132: src/CMakeFiles/Al.dir/progress.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:958: src/CMakeFiles/Al.dir/all] Error 2
make: *** [Makefile:146: all] Error 2
thread 'main' panicked at '
command did not execute successfully, got: exit status: 2
build script failed, must exit now', /home/ubuntu/.cargo/registry/src/github.com-1ecc6299db9ec823/cmake-0.1.48/src/lib.rs:975:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: cargo failed with code: 101
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for bagua
Failed to build bagua
ERROR: Could not build wheels for bagua, which is required to install pyproject.toml-based projects
Additional context
The provided Bagua AMI is outdated, therefore I'm not using it.
Describe the bug
I am trying to launch the MNIST example on a single machine on AWS. The dependencies fail to install. Likely to do something with Rust. Am I missing dependencies?
Environment
python3 -m pip install --pre bagua
)?: NoReproducing
Launch g4dn.xlarge instance with AMI: ami-00a2823259d140f46 (Deep Learning AMI GPU CUDA 11.3.1 (Ubuntu 20.04) 20220303, Built with NVIDIA CUDA, cuDNN, NCCL, GPU Driver, Docker, NVIDIA-Docker and EFA support)
check CUDA version:
nvcc --version
-> version 11.3install rust:
curl https://sh.rustup.rs -sSf | sh -s -- --default-toolchain stable -y
install bagua package:
pip install bagua-cuda113
-> exits successfully.clone bagua repo:
git clone https://github.com/BaguaSys/bagua.git && cd bagua/examples/mnist
install requirements:
pip install -r requirements.txt
-> error, see below.Additional context
The provided Bagua AMI is outdated, therefore I'm not using it.