NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.27k stars 1.37k forks source link

Installation Error multi_tensor_sgd_kernel.cpp1.ii #557

Open yaojunr opened 4 years ago

yaojunr commented 4 years ago

The installation command is pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ The full log is as follows: /home/zxm/anaconda3/lib/python3.7/site-packages/pip/_internal/commands/install.py:283: UserWarning: Disabling all use of wheels due to the use of --build-options / --global-options / --install-options. cmdoptions.check_install_build_global(options) Created temporary directory: /tmp/pip-ephem-wheel-cache-p4vtlsv6 Created temporary directory: /tmp/pip-req-tracker-zuuadv7g Created requirements tracker '/tmp/pip-req-tracker-zuuadv7g' Created temporary directory: /tmp/pip-install-yw4by8oy Processing /home/zxm/apex Created temporary directory: /tmp/pip-req-build-qok5db0u Added file:///home/zxm/apex to build tracker '/tmp/pip-req-tracker-zuuadv7g' Running setup.py (path:/tmp/pip-req-build-qok5db0u/setup.py) egg_info for package from file:///home/zxm/apex Running command python setup.py egg_info torch.version = 1.3.0+cu92 running egg_info creating /tmp/pip-req-build-qok5db0u/pip-egg-info/apex.egg-info writing /tmp/pip-req-build-qok5db0u/pip-egg-info/apex.egg-info/PKG-INFO writing dependency_links to /tmp/pip-req-build-qok5db0u/pip-egg-info/apex.egg-info/dependency_links.txt writing top-level names to /tmp/pip-req-build-qok5db0u/pip-egg-info/apex.egg-info/toplevel.txt writing manifest file '/tmp/pip-req-build-qok5db0u/pip-egg-info/apex.egg-info/SOURCES.txt' reading manifest file '/tmp/pip-req-build-qok5db0u/pip-egg-info/apex.egg-info/SOURCES.txt' writing manifest file '/tmp/pip-req-build-qok5db0u/pip-egg-info/apex.egg-info/SOURCES.txt' /tmp/pip-req-build-qok5db0u/setup.py:43: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies! warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!") Source in /tmp/pip-req-build-qok5db0u has version 0.1, which satisfies requirement apex==0.1 from file:///home/zxm/apex Removed apex==0.1 from file:///home/zxm/apex from build tracker '/tmp/pip-req-tracker-zuuadv7g' Skipping wheel build for apex, due to binaries being disabled for it. Installing collected packages: apex Created temporary directory: /tmp/pip-record-moss9qo Running command /home/zxm/anaconda3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-qok5db0u/setup.py'"'"'; file='"'"'/tmp/pip-req-build-qok5db0u/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' --cpp_ext --cudaext install --record /tmp/pip-record-moss9qo/install-record.txt --single-version-externally-managed --compile torch.version = 1.3.0+cu92 /tmp/pip-req-build-qok5db0u/setup.py:43: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies! warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148
from /usr/local/cuda-9.2/bin

running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/apex
copying apex/__init__.py -> build/lib.linux-x86_64-3.7/apex
creating build/lib.linux-x86_64-3.7/apex/reparameterization
copying apex/reparameterization/weight_norm.py -> build/lib.linux-x86_64-3.7/apex/reparameterization
copying apex/reparameterization/reparameterization.py -> build/lib.linux-x86_64-3.7/apex/reparameterization
copying apex/reparameterization/__init__.py -> build/lib.linux-x86_64-3.7/apex/reparameterization
creating build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/optimized_sync_batchnorm_kernel.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/sync_batchnorm_kernel.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/__init__.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/optimized_sync_batchnorm.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/LARC.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/multiproc.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/distributed.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/sync_batchnorm.py -> build/lib.linux-x86_64-3.7/apex/parallel
creating build/lib.linux-x86_64-3.7/apex/RNN
copying apex/RNN/models.py -> build/lib.linux-x86_64-3.7/apex/RNN
copying apex/RNN/cells.py -> build/lib.linux-x86_64-3.7/apex/RNN
copying apex/RNN/RNNBackend.py -> build/lib.linux-x86_64-3.7/apex/RNN
copying apex/RNN/__init__.py -> build/lib.linux-x86_64-3.7/apex/RNN
creating build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
copying apex/multi_tensor_apply/__init__.py -> build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
copying apex/multi_tensor_apply/multi_tensor_apply.py -> build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
creating build/lib.linux-x86_64-3.7/apex/normalization
copying apex/normalization/fused_layer_norm.py -> build/lib.linux-x86_64-3.7/apex/normalization
copying apex/normalization/__init__.py -> build/lib.linux-x86_64-3.7/apex/normalization
creating build/lib.linux-x86_64-3.7/apex/pyprof
copying apex/pyprof/__init__.py -> build/lib.linux-x86_64-3.7/apex/pyprof
creating build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/_initialize.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/compat.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/handle.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/_amp_state.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/utils.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/amp.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/wrap.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/opt.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/rnn_compat.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/__init__.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/frontend.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/_process_optimizer.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/__version__.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/scaler.py -> build/lib.linux-x86_64-3.7/apex/amp
creating build/lib.linux-x86_64-3.7/apex/optimizers
copying apex/optimizers/fused_sgd.py -> build/lib.linux-x86_64-3.7/apex/optimizers
copying apex/optimizers/fused_lamb.py -> build/lib.linux-x86_64-3.7/apex/optimizers
copying apex/optimizers/__init__.py -> build/lib.linux-x86_64-3.7/apex/optimizers
copying apex/optimizers/fused_adam.py -> build/lib.linux-x86_64-3.7/apex/optimizers
copying apex/optimizers/fused_novograd.py -> build/lib.linux-x86_64-3.7/apex/optimizers
creating build/lib.linux-x86_64-3.7/apex/fp16_utils
copying apex/fp16_utils/fp16_optimizer.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
copying apex/fp16_utils/__init__.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
copying apex/fp16_utils/loss_scaler.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
copying apex/fp16_utils/fp16util.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
creating build/lib.linux-x86_64-3.7/apex/contrib
copying apex/contrib/__init__.py -> build/lib.linux-x86_64-3.7/apex/contrib
creating build/lib.linux-x86_64-3.7/apex/pyprof/parse
copying apex/pyprof/parse/nvvp.py -> build/lib.linux-x86_64-3.7/apex/pyprof/parse
copying apex/pyprof/parse/kernel.py -> build/lib.linux-x86_64-3.7/apex/pyprof/parse
copying apex/pyprof/parse/__main__.py -> build/lib.linux-x86_64-3.7/apex/pyprof/parse
copying apex/pyprof/parse/db.py -> build/lib.linux-x86_64-3.7/apex/pyprof/parse
copying apex/pyprof/parse/__init__.py -> build/lib.linux-x86_64-3.7/apex/pyprof/parse
copying apex/pyprof/parse/parse.py -> build/lib.linux-x86_64-3.7/apex/pyprof/parse
creating build/lib.linux-x86_64-3.7/apex/pyprof/nvtx
copying apex/pyprof/nvtx/nvmarker.py -> build/lib.linux-x86_64-3.7/apex/pyprof/nvtx
copying apex/pyprof/nvtx/__init__.py -> build/lib.linux-x86_64-3.7/apex/pyprof/nvtx
creating build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/reduction.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/base.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/optim.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/__main__.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/output.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/softmax.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/normalization.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/usage.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/index_slice_join_mutate.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/misc.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/prof.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/blas.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/linear.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/pooling.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/activation.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/embedding.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/loss.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/__init__.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/data.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/utility.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/conv.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/recurrentCell.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/convert.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/randomSample.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/pointwise.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/dropout.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
creating build/lib.linux-x86_64-3.7/apex/amp/lists
copying apex/amp/lists/tensor_overrides.py -> build/lib.linux-x86_64-3.7/apex/amp/lists
copying apex/amp/lists/torch_overrides.py -> build/lib.linux-x86_64-3.7/apex/amp/lists
copying apex/amp/lists/__init__.py -> build/lib.linux-x86_64-3.7/apex/amp/lists
copying apex/amp/lists/functional_overrides.py -> build/lib.linux-x86_64-3.7/apex/amp/lists
creating build/lib.linux-x86_64-3.7/apex/contrib/groupbn
copying apex/contrib/groupbn/batch_norm.py -> build/lib.linux-x86_64-3.7/apex/contrib/groupbn
copying apex/contrib/groupbn/__init__.py -> build/lib.linux-x86_64-3.7/apex/contrib/groupbn
creating build/lib.linux-x86_64-3.7/apex/contrib/xentropy
copying apex/contrib/xentropy/softmax_xentropy.py -> build/lib.linux-x86_64-3.7/apex/contrib/xentropy
copying apex/contrib/xentropy/__init__.py -> build/lib.linux-x86_64-3.7/apex/contrib/xentropy
creating build/lib.linux-x86_64-3.7/apex/contrib/optimizers
copying apex/contrib/optimizers/fp16_optimizer.py -> build/lib.linux-x86_64-3.7/apex/contrib/optimizers
copying apex/contrib/optimizers/__init__.py -> build/lib.linux-x86_64-3.7/apex/contrib/optimizers
copying apex/contrib/optimizers/fused_adam.py -> build/lib.linux-x86_64-3.7/apex/contrib/optimizers
running build_ext
building 'apex_C' extension
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/csrc
gcc -pthread -B /home/zxm/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/zxm/anaconda3/lib/python3.7/site-packages/torch/include -I/home/zxm/anaconda3/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/zxm/anaconda3/lib/python3.7/site-packages/torch/include/TH -I/home/zxm/anaconda3/lib/python3.7/site-packages/torch/include/THC -I/home/zxm/anaconda3/include/python3.7m -c csrc/flatten_unflatten.cpp -o build/temp.linux-x86_64-3.7/csrc/flatten_unflatten.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=apex_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
g++ -pthread -shared -B /home/zxm/anaconda3/compiler_compat -L/home/zxm/anaconda3/lib -Wl,-rpath=/home/zxm/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/csrc/flatten_unflatten.o -o build/lib.linux-x86_64-3.7/apex_C.cpython-37m-x86_64-linux-gnu.so
building 'amp_C' extension
gcc -pthread -B /home/zxm/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/zxm/anaconda3/lib/python3.7/site-packages/torch/include -I/home/zxm/anaconda3/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/zxm/anaconda3/lib/python3.7/site-packages/torch/include/TH -I/home/zxm/anaconda3/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-9.2/include -I/home/zxm/anaconda3/include/python3.7m -c csrc/amp_C_frontend.cpp -o build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda-9.2/bin/nvcc -I/home/zxm/anaconda3/lib/python3.7/site-packages/torch/include -I/home/zxm/anaconda3/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/zxm/anaconda3/lib/python3.7/site-packages/torch/include/TH -I/home/zxm/anaconda3/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-9.2/include -I/home/zxm/anaconda3/include/python3.7m -c csrc/multi_tensor_sgd_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_sgd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++11
/usr/local/cuda-9.2/include/crt/common_functions.h(73): error: explicit type is missing ("int" assumed)

/usr/local/cuda-9.2/include/crt/common_functions.h(73): error: attribute "__host__" does not apply here

/usr/local/cuda-9.2/include/crt/common_functions.h(73): error: attribute "cudart_builtin" does not apply here

/usr/local/cuda-9.2/include/crt/common_functions.h(73): error: expected a ";"

/usr/local/cuda-9.2/include/crt/common_functions.h(77): warning: parsing restarts here after previous syntax error

/usr/local/cuda-9.2/include/crt/common_functions.h(129): error: explicit type is missing ("int" assumed)

/usr/local/cuda-9.2/include/crt/common_functions.h(135): error: attribute "__host__" does not apply here

/usr/local/cuda-9.2/include/crt/common_functions.h(135): error: attribute "cudart_builtin" does not apply here

/usr/local/cuda-9.2/include/crt/common_functions.h(135): error: expected a ";"

/usr/local/cuda-9.2/include/crt/common_functions.h(139): error: explicit type is missing ("int" assumed)

/usr/local/cuda-9.2/include/crt/common_functions.h(139): error: attribute "__host__" does not apply here

/usr/local/cuda-9.2/include/crt/common_functions.h(139): error: attribute "cudart_builtin" does not apply here

/usr/local/cuda-9.2/include/crt/common_functions.h(139): error: expected a ";"

/usr/local/cuda-9.2/include/crt/common_functions.h(139): warning: parsing restarts here after previous syntax error

/usr/local/cuda-9.2/include/crt/common_functions.h(140): error: explicit type is missing ("int" assumed)

/usr/local/cuda-9.2/include/crt/common_functions.h(140): error: attribute "__host__" does not apply here

/usr/local/cuda-9.2/include/crt/common_functions.h(140): error: attribute "cudart_builtin" does not apply here

/usr/local/cuda-9.2/include/crt/common_functions.h(140): error: expected a ";"

/usr/local/cuda-9.2/include/crt/common_functions.h(140): warning: parsing restarts here after previous syntax error

/home/zxm/anaconda3/lib/python3.7/site-packages/torch/include/c10/core/TensorTypeSet.h(44): warning: integer conversion resulted in a change of sign

16 errors detected in the compilation of "/tmp/tmpxft_00003a4f_00000000-6_multi_tensor_sgd_kernel.cpp1.ii".
error: command '/usr/local/cuda-9.2/bin/nvcc' failed with exit status 1
Running setup.py install for apex ... error

Cleaning up... Removing source in /tmp/pip-req-build-qok5db0u Removed build tracker '/tmp/pip-req-tracker-zuuadv7g' ERROR: Command errored out with exit status 1: /home/zxm/anaconda3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-qok5db0u/setup.py'"'"'; file='"'"'/tmp/pip-req-build-qok5db0u/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' --cpp_ext --cudaext install --record /tmp/pip-record-moss9qo/install-record.txt --single-version-externally-managed --compile Check the logs for full command output. Exception information: Traceback (most recent call last): File "/home/zxm/anaconda3/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 153, in _main status = self.run(options, args) File "/home/zxm/anaconda3/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 455, in run use_user_site=options.use_user_site, File "/home/zxm/anaconda3/lib/python3.7/site-packages/pip/_internal/req/init.py", line 62, in install_given_reqs **kwargs File "/home/zxm/anaconda3/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 888, in install cwd=self.unpacked_source_directory, File "/home/zxm/anaconda3/lib/python3.7/site-packages/pip/_internal/utils/subprocess.py", line 275, in runner spinner=spinner, File "/home/zxm/anaconda3/lib/python3.7/site-packages/pip/_internal/utils/subprocess.py", line 242, in call_subprocess raise InstallationError(exc_msg) pip._internal.exceptions.InstallationError: Command errored out with exit status 1: /home/zxm/anaconda3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-qok5db0u/setup.py'"'"'; file='"'"'/tmp/pip-req-build-qok5db0u/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' --cpp_ext --cudaext install --record /tmp/pip-record-moss9qo/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.

mcarilli commented 4 years ago

This doesn't look like an error with the Apex source code, this looks like an error with the Cuda runtime source somehow. It could be a mismatch between Cuda and GCC versions. Can you try compiling Pytorch's builtin extensions?

$ cd pytorch_repo_dir/test/cpp_extensions
$ python setup.py install
2Moonsbird commented 4 years ago

check /usr/local/cuda/include/host_config.h and other headers which you may changed. I accidentally changed the file wrongly according to the tutorial.