facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
8.66k stars 614 forks source link

build stopped: subcommand failed #651

Open ataa opened 1 year ago

ataa commented 1 year ago

I built xformers 3 times in the past weeks successfully (for Torch 2.0 nightly), Today I tried to build it again in the same environment (no changes other than some unrelated windows updates) and after 2 minutes or so, I received this error. How can I debug this error?

ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "C:\buildtemp\xformers\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1893, in _run_ninja_build
    subprocess.run(
  File "C:\Users\Receiving\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '16']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\buildtemp\xformers\setup.py", line 363, in <module>
    setuptools.setup(
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\__init__.py", line 108, in setup
    return distutils.core.setup(**attrs)
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
    return run_commands(dist)
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
    dist.run_commands()
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\_distutils\dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\dist.py", line 1213, in run_command
    super().run_command(command)
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
    cmd_obj.run()
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\_distutils\command\build.py", line 132, in run
    self.run_command(cmd_name)
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\dist.py", line 1213, in run_command
    super().run_command(command)
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
    cmd_obj.run()
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\command\build_ext.py", line 84, in run
    _build_ext.run(self)
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 346, in run
    self.build_extensions()
  File "C:\buildtemp\xformers\setup.py", line 308, in build_extensions
    super().build_extensions()
  File "C:\buildtemp\xformers\venv\lib\site-packages\torch\utils\cpp_extension.py", line 843, in build_extensions
    build_ext.build_extensions(self)
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 468, in build_extensions
    self._build_extensions_serial()
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 494, in _build_extensions_serial
    self.build_extension(ext)
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
    _build_ext.build_extension(self, ext)
  File "C:\buildtemp\xformers\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 549, in build_extension
    objects = self.compiler.compile(
  File "C:\buildtemp\xformers\venv\lib\site-packages\torch\utils\cpp_extension.py", line 815, in win_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "C:\buildtemp\xformers\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "C:\buildtemp\xformers\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1909, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
Package            Version
------------------ ------------------------
certifi            2022.12.7
charset-normalizer 3.0.1
idna               3.4
mpmath             1.2.1
mypy-extensions    0.4.3
networkx           3.0
ninja              1.11.1
numpy              1.24.1
Pillow             9.4.0
pip                22.3.1
pyre-extensions    0.0.23
requests           2.28.2
setuptools         66.1.1
sympy              1.11.1
torch              2.0.0.dev20230121+cu118
torchaudio         2.0.0.dev20230123+cu118
torchvision        0.15.0.dev20230123+cu118
typing_extensions  4.4.0
typing-inspect     0.8.0
urllib3            1.26.14
wheel              0.38.4

Windows 10 Home, Cuda 11.8, Latest VS

$env:TORCH_CUDA_ARCH_LIST="8.6"
$env:NVCC_FLAGS="--use_fast_math -DXFORMERS_MEM_EFF_ATTENTION_DISABLE_BACKWARD"
$env:CUDA_PATH="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8"
$env:MAX_JOBS=16
$env:FORCE_CUDA=1
nicolas-dufour commented 1 year ago

Hi, i'm having a similar issue on linux, Running pytorch nightlies as well.

bottler commented 1 year ago

The build is failing with no particular message, which suggests that it is running out of memory. In your case, ninja is trying to do 16 compilations at a time, and we want to reduce this. Try setting the environment variable MAX_JOBS to a small number like 1 or 2.

ataa commented 1 year ago

The build is failing with no particular message, which suggests that it is running out of memory. In your case, ninja is trying to do 16 compilations at a time, and we want to reduce this. Try setting the environment variable MAX_JOBS to a small number like 1 or 2.

I have 64GB of available memory and it barely reaches 4GB during the build with 16 jobs, but I set it to 1 and tried again, same issue. then I removed -DXFORMERS_MEM_EFF_ATTENTION_DISABLE_BACKWARD and finally been able to build it. not sure why adding that nvcc flag caused a failed build.

ataa commented 1 year ago

Additional information:

15 errors detected in the compilation of "C:/buildtemp/xformers/xformers/csrc/attention/cuda/fmha/kernels/backward_bf16_aligned_dropout_k128.cu".
[6/67] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\buildtemp\xfor
mers\build\temp.win-amd64-cpython-310\Release\xformers\csrc\attention\cuda\fmha\kernels\backward_bf16_aligned.obj.d --use-local-env -Xcompiler /MD
-Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EH
sc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dl
l_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Dxformers_EXPORTS -IC:\buildtemp\xformers\xfor
mers\csrc -IC:\buildtemp\xformers\third_party\sputnik -IC:\buildtemp\xformers\third_party\cutlass\include -IC:\buildtemp\xformers\third_party\cutla
ss\examples -IC:\buildtemp\xformers\venv\lib\site-packages\torch\include -IC:\buildtemp\xformers\venv\lib\site-packages\torch\include\torch\csrc\ap
i\include -IC:\buildtemp\xformers\venv\lib\site-packages\torch\include\TH -IC:\buildtemp\xformers\venv\lib\site-packages\torch\include\THC "-IC:\Pr
ogram Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IC:\buildtemp\xformers\venv\include -IC:\Users\Receiving\AppData\Local\Programs\Pytho
n\Python310\include -IC:\Users\Receiving\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\Bui
ldTools\VC\Tools\MSVC\14.34.31933\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program
 Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.20348.0\\um" "-IC:\Program Files (x
86)\Windows Kits\10\\include\10.0.20348.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.20348.0\\winrt" "-IC:\Program Files (x86
)\Windows Kits\10\\include\10.0.20348.0\\cppwinrt" -c C:\buildtemp\xformers\xformers\csrc\attention\cuda\fmha\kernels\backward_bf16_aligned.cu -o C
:\buildtemp\xformers\build\temp.win-amd64-cpython-310\Release\xformers\csrc\attention\cuda\fmha\kernels\backward_bf16_aligned.obj -D__CUDA_NO_HALF_
OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DHAS_PYTORCH
--use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE --generate-line-in
fo -DNDEBUG --use_fast_math -arch=sm_86 -DXFORMERS_MEM_EFF_ATTENTION_DISABLE_BACKWARD --threads 4 --ptxas-options=-v -std=c++17 -Xcompiler /Zc:lamb
da -Xcompiler /Zc:preprocessor -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: C:/buildtemp/xformers/build/temp.win-amd64-cpython-310/Release/xformers/csrc/attention/cuda/fmha/kernels/backward_bf16_aligned.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc --generate-dependencies-with-compile --dependency-output C:\buildtemp\xformers\bu
ild\temp.win-amd64-cpython-310\Release\xformers\csrc\attention\cuda\fmha\kernels\backward_bf16_aligned.obj.d --use-local-env -Xcompiler /MD -Xcompi
ler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcu
dafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_inter
face_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Dxformers_EXPORTS -IC:\buildtemp\xformers\xformers\cs
rc -IC:\buildtemp\xformers\third_party\sputnik -IC:\buildtemp\xformers\third_party\cutlass\include -IC:\buildtemp\xformers\third_party\cutlass\exam
ples -IC:\buildtemp\xformers\venv\lib\site-packages\torch\include -IC:\buildtemp\xformers\venv\lib\site-packages\torch\include\torch\csrc\api\inclu
de -IC:\buildtemp\xformers\venv\lib\site-packages\torch\include\TH -IC:\buildtemp\xformers\venv\lib\site-packages\torch\include\THC "-IC:\Program F
iles\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -IC:\buildtemp\xformers\venv\include -IC:\Users\Receiving\AppData\Local\Programs\Python\Pytho
n310\include -IC:\Users\Receiving\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools
\VC\Tools\MSVC\14.34.31933\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files
(x86)\Windows Kits\10\include\10.0.20348.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.20348.0\\um" "-IC:\Program Files (x86)\Win
dows Kits\10\\include\10.0.20348.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.20348.0\\winrt" "-IC:\Program Files (x86)\Windo
ws Kits\10\\include\10.0.20348.0\\cppwinrt" -c C:\buildtemp\xformers\xformers\csrc\attention\cuda\fmha\kernels\backward_bf16_aligned.cu -o C:\build
temp\xformers\build\temp.win-amd64-cpython-310\Release\xformers\csrc\attention\cuda\fmha\kernels\backward_bf16_aligned.obj -D__CUDA_NO_HALF_OPERATO
RS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DHAS_PYTORCH --use_f
ast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --extended-lambda -D_ENABLE_EXTENDED_ALIGNED_STORAGE --generate-line-info -DND
EBUG --use_fast_math -arch=sm_86 -DXFORMERS_MEM_EFF_ATTENTION_DISABLE_BACKWARD --threads 4 --ptxas-options=-v -std=c++17 -Xcompiler /Zc:lambda -Xco
mpiler /Zc:preprocessor -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
backward_bf16_aligned.cu
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
backward_bf16_aligned.cu
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_OPERATORS__' with '/U__CUDA_NO_HALF_OPERATORS__'
cl : Command line warning D9025 : overriding '/D__CUDA_NO_HALF_CONVERSIONS__' with '/U__CUDA_NO_HALF_CONVERSIONS__'
backward_bf16_aligned.cu
C:\buildtemp\xformers\xformers\csrc\attention\cuda\fmha\kernels\backward_bf16_aligned.cu(3): error: this declaration has no storage class or type s
pecifier

C:\buildtemp\xformers\xformers\csrc\attention\cuda\fmha\kernels\backward_bf16_aligned.cu(3): error: name followed by "::" must be a class or namesp
ace name

C:\buildtemp\xformers\xformers\csrc\attention\cuda\fmha\kernels\backward_bf16_aligned.cu(3): error: too many initializer values

C:\buildtemp\xformers\xformers\csrc\attention\cuda\fmha\kernels\backward_bf16_aligned.cu(4): error: this declaration has no storage class or type s
pecifier
danthe3rd commented 1 year ago

@ataa thanks for the bugreport. That's a weird issue - I plan to change a bit how we instantiate kernels so it should hopefully go away soon