NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.2k stars 1.36k forks source link

bf16 support for FusedDense preventing apex build on CUDA 10.2 #1670

Open minostauros opened 1 year ago

minostauros commented 1 year ago

Describe the Bug

Minimal Steps/Code to Reproduce the Bug

bf16 support for FusedDense is added in https://github.com/NVIDIA/apex/pull/1627 04207351f5e5491546f09bae169cf76d1cd820d9 is successfully built on 10.2: https://github.com/minostauros/pytorch-video-docker/actions/runs/4732576445/jobs/8398928476 8b7a1ff183741dd8f9b87e7bafd04cfde99cea28 is not: https://github.com/minostauros/pytorch-video-docker/actions/runs/5070398012/jobs/9105381516#step:7:1824 log: [9_docker (pytorch-1.9.0torchvision-0.10.0cuda-10.2ffmpeg-4.2.txt](https://github.com/NVIDIA/apex/files/11556737/9_docker.pytorch-1.9.0torchvision-0.10.0cuda-10.2ffmpeg-4.2.txt) ```text #6 987.9 /usr/local/cuda/bin/nvcc -I/usr/local/lib/python3.7/dist-packages/torch/include -I/usr/local/lib/python3.7/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.7/dist-packages/torch/include/TH -I/usr/local/lib/python3.7/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.7m -c -c /tmp/apex/csrc/fused_dense_cuda.cu -o /tmp/apex/build/temp.linux-x86_64-cpython-37/csrc/fused_dense_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fused_dense_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_37,code=sm_37 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_62,code=sm_62 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -std=c++14 #6 987.9 /tmp/apex/csrc/fused_dense_cuda.cu(156): error: identifier "CUDA_R_16BF" is undefined ``` **Evironment** ```Dockerfile ARG PYTHON_VERSION=3.7 ARG PYTORCH_VERSION=1.9.0 ARG TORCHVISION_VERSION=0.10.0 ARG TORCHAUDIO_VERSION=0.9.0 ARG TARGET_CUDA_VERSION=cu102 ARG CUDA_VERSION=10.2 # Kepler(partial) to Turing (3.5-7.5) with CUDA 10.2 ENV TORCH_CUDA_ARCH_LIST=3.5;3.7;5.0;5.2;6.0;6.1;6.2;7.0;7.5 RUN DIR=/tmp/apex && \ mkdir -p ${DIR} && \ cd ${DIR} && \ git clone https://github.com/minostauros/apex . && \ pip${PYTHON_VERSION} install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ && \ rm -rf ${DIR} && \ cd /tmp/ ```
minostauros commented 1 year ago

We might need something like this: https://github.com/NVIDIA/apex/pull/1312 (Build fused_weight_gradient_mlp_cuda only when CUDA > 10)

crcrpar commented 1 year ago

log: 9_docker (pytorch-1.9.0torchvision-0.10.0cuda-10.2ffmpeg-4.2.txt

would this mean pytorch 1.9 is installed in your environment? if so, the pytorch itself is a bit too old to compile apex

minostauros commented 1 year ago

However, pytorch 1.9 still compiles well before https://github.com/NVIDIA/apex/pull/1627 (even today). I will deprecate old pytorch versions but I believe apex can still manage this case.

ycsos commented 1 year ago

I use this PR 1672 fix the build problem

minostauros commented 1 year ago

I use this PR https://github.com/NVIDIA/apex/pull/1672 fix the build problem

I'm not sure the PR is about the same error. The message is different ("CUDA_R_16BF" is undefined & at::Tensor::type() const’ is deprecated)

PH8411 commented 1 year ago

same issue