Open wvidana opened 2 years ago
You can't change that particular one for [REDACTED] reasons
Thank you for providing the inspiration. I used the following command to finally compile and install:
MAX_JOBS=1 NVCC_APPEND_FLAGS='--threads 1' pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
The MAX_JOBS=1 part seems to be the most important.
Describe the Bug The
--threads
option fornvcc
is hardcoded to4
, which makes building NVIDIA apex on certain environments impossible, specially from adocker build
on a CI/CD environment.https://github.com/NVIDIA/apex/blob/3ff1a10f72ec07067c4e44759442329804ac5162/setup.py#L54-L58
The
pip install --user --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext"
command fails after several minutes with:There should be an ENV VAR or argument to set this value if needed. The only possible solution was to set
ENV NVCC_APPEND_FLAGS='--threads 2'
Minimal Steps/Code to Reproduce the Bug Set up a Dockerfile with a very simple apex installation, like
Then try to run a
docker build
within Bitbucket Pipelines, even usingsize: 2x
, or to try locally (on a Mac) limiting the memory to 6g with no swapdocker build --tag my_image --memory=6g --memory-swap=6g -f Dockerfile .
After about 20 mins the build will fail with 2000 error lines
Expected Behavior The build should have no issues. This same scenario with
ENV NVCC_APPEND_FLAGS='--threads 2'
succeeds, though it takes 45 minutesEnvironment Using the docker image
pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel
which comes with:Python 3.7.11
Cuda compilation tools, release 11.3, V11.3.109
Pytorch 1.10.0