NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.4k stars 1.4k forks source link

NVCC --threads option is hardcoded #1415

Open wvidana opened 2 years ago

wvidana commented 2 years ago

Describe the Bug The --threads option for nvcc is hardcoded to 4, which makes building NVIDIA apex on certain environments impossible, specially from a docker build on a CI/CD environment.

https://github.com/NVIDIA/apex/blob/3ff1a10f72ec07067c4e44759442329804ac5162/setup.py#L54-L58

The pip install --user --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" command fails after several minutes with:

    ERROR: Command errored out with exit status 1:
     command: /opt/conda/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-pt_auz3e/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-pt_auz3e/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /tmp/pip-record-fzizerkp/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /root/.local/include/python3.7m/apex
         cwd: /tmp/pip-req-build-pt_auz3e/
    Complete output (1982 lines):
[...]
    /usr/local/cuda-11.3/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.3/include -I/opt/conda/include/python3.7m -c csrc/mlp_cuda.cu -o build/temp.linux-x86_64-3.7/csrc/mlp_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=mlp_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -std=c++14
    Killed
    Killed
    Killed
    error: command '/usr/local/cuda-11.3/bin/nvcc' failed with exit status 255
    ----------------------------------------
ERROR: Command errored out with exit status 1: /opt/conda/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-pt_auz3e/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-pt_auz3e/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /tmp/pip-record-fzizerkp/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /root/.local/include/python3.7m/apex Check the logs for full command output.

There should be an ENV VAR or argument to set this value if needed. The only possible solution was to set ENV NVCC_APPEND_FLAGS='--threads 2'

Minimal Steps/Code to Reproduce the Bug Set up a Dockerfile with a very simple apex installation, like

FROM pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel

#HERE install the new CUDA keyring ang git

## Install Apex
RUN git clone https://github.com/NVIDIA/apex.git \
    && cd apex \
    && pip install --user --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Then try to run a docker build within Bitbucket Pipelines, even using size: 2x, or to try locally (on a Mac) limiting the memory to 6g with no swap docker build --tag my_image --memory=6g --memory-swap=6g -f Dockerfile .

After about 20 mins the build will fail with 2000 error lines

Expected Behavior The build should have no issues. This same scenario with ENV NVCC_APPEND_FLAGS='--threads 2' succeeds, though it takes 45 minutes

Environment Using the docker image pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel which comes with:

sol173angs commented 2 years ago

You can't change that particular one for [REDACTED] reasons

Luckyboys commented 6 months ago

Thank you for providing the inspiration. I used the following command to finally compile and install:

MAX_JOBS=1 NVCC_APPEND_FLAGS='--threads 1' pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./

The MAX_JOBS=1 part seems to be the most important.