NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.41k stars 1.4k forks source link

Win10 platform installation encountered an error caused by macro definition #835

Open yeyuxmf opened 4 years ago

yeyuxmf commented 4 years ago

My configuration is as follows: win10 pytorch1.2.0 cuda10.0 vs2017 The installation command is as follows: python setup.py install --cuda_ext --cpp_ext Please tell me how to solve this problem, thank you! ............................................... ........................................... D:/python-3.6.7/lib/site-packages/torch/include\c10/cuda/CUDAStream.h(171): warning: field of class type without a DLL interface used in a class with a DLL interface D:/vs2017/VC/Tools/MSVC/14.16.27023/include\type_traits(1271): error: static assertion failed with "You've instantiated std::aligned_storage<Len, Align> with an extended alignment (in other words, Align > alignof(max_align_t)). Before VS 2017 15.8, the member type would non-conformingly have an alignment of only alignof(max_align_t). VS 2017 15.8 was fixed to handle this correctly, but the fix inherently changes layout and breaks binary compatibility (only for uses of aligned_storage with extended alignments). Please define either (1) _ENABLE_EXTENDED_ALIGNED_STORAGE to acknowledge that you understand this message and that you actually want a type with an extended alignment, or (2) _DISABLE_EXTENDED_ALIGNED_STORAGE to silence this message and get the old non-conformant behavior." detected during: instantiation of class "std::_Aligned<_Len, _Align, double, false> [with _Len=16ULL, _Align=16ULL]" (1291): here instantiation of class "std::_Aligned<_Len, _Align, int, false> [with _Len=16ULL, _Align=16ULL]" (1298): here instantiation of class "std::_Aligned<_Len, _Align, short, false> [with _Len=16ULL, _Align=16ULL]" (1305): here instantiation of class "std::_Aligned<_Len, _Align, char, false> [with _Len=16ULL, _Align=16ULL]" (1312): here instantiation of class "std::aligned_storage<_Len, _Align> [with _Len=16ULL, _Align=16ULL]" csrc/multi_tensor_scale_kernel.cu(25): here instantiation of "void load_store(T , T , int, int) [with T=float]" csrc/multi_tensor_scale_kernel.cu(64): here

yeyuxmf commented 4 years ago

@ajtulloch I hope you can help me, thank you!

neonbjb commented 4 years ago

This is a dumb fix, but if you revert back to a commit on the master branch from a couple of months ago, you will be able to build it. Hash 2ec84ebdca59278eaf15e8ddf32476d9d6d8b904 worked for me, for example.

I doubt we're going to get much support from the apex team due to their stance on Windows support. If I find some time I'll dig in and see if I can figure out what is causing the issue. It seems like you simply need to find the offending code and add in the define.

kezewang commented 4 years ago

Please try my forked version: https://github.com/kezewang/apex. Moreover, please refer to the following link to fix Windows MSVC Compatibility https://github.com/facebookresearch/pytorch3d/issues/10

It costs me a whole day to address all the issues.

crispin-nosidam commented 4 years ago

This is a dumb fix, but if you revert back to a commit on the master branch from a couple of months ago, you will be able to build it. Hash 2ec84eb worked for me, for example.

I doubt we're going to get much support from the apex team due to their stance on Windows support. If I find some time I'll dig in and see if I can figure out what is causing the issue. It seems like you simply need to find the offending code and add in the define.

It works - thanks!

gordicaleksa commented 2 years ago

Hitting this error myself:

C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.24.28314/include\type_traits(1061): error: static assertion failed with "You've instantiated std::aligned_storage<Len, Align> with an extended alignment (in other words, Align > alignof(max_align_t)). Before VS 2017 15.8, the member "type" would non-conformingly have an alignment of only alignof(max_align_t). VS 2017 15.8 was fixed to handle this correctly, but the fix inherently changes layout and breaks binary compatibility (*only* for uses of aligned_storage with extended alignments). Please define either (1) _ENABLE_EXTENDED_ALIGNED_STORAGE to acknowledge that you understand this message and that you actually want a type with an extended alignment, or (2) _DISABLE_EXTENDED_ALIGNED_STORAGE to silence this message and get the old non-conforming behavior."
            detected during:
              instantiation of class "std::_Aligned<_Len, _Align, double, false> [with _Len=16ULL, _Align=16ULL]"
  (1079): here
              instantiation of class "std::_Aligned<_Len, _Align, int, false> [with _Len=16ULL, _Align=16ULL]"
  (1084): here
              instantiation of class "std::_Aligned<_Len, _Align, short, false> [with _Len=16ULL, _Align=16ULL]"
  (1089): here
              instantiation of class "std::_Aligned<_Len, _Align, char, false> [with _Len=16ULL, _Align=16ULL]"
  (1094): here
              instantiation of class "std::aligned_storage<_Len, _Align> [with _Len=16ULL, _Align=16ULL]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_axpby_kernel.cu(23): here
              instantiation of "void load_store(T *, T *, int, int) [with T=float]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_axpby_kernel.cu(68): here
              instantiation of "void AxpbyFunctor<x_t, y_t, out_t>::operator()(int, volatile int *, TensorListMetadata<3> &, float, float, int) [with x_t=float, y_t=float, out_t=float]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_apply.cuh(38): here
              instantiation of "void multi_tensor_apply_kernel(int, volatile int *, T, U, ArgTypes...) [with T=TensorListMetadata<3>, U=AxpbyFunctor<float, float, float>, ArgTypes=<float, float, int>]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_apply.cuh(109): here
              instantiation of "void multi_tensor_apply<depth,T,ArgTypes...>(int, int, const at::Tensor &, const std::vector<std::vector<at::Tensor, std::allocator<at::Tensor>>, std::allocator<std::vector<at::Tensor, std::allocator<at::Tensor>>>> &, T, ArgTypes...) [with depth=3, T=AxpbyFunctor<float, float, float>, ArgTypes=<float, float, int>]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_axpby_kernel.cu(141): here

  1 error detected in the compilation of "csrc/multi_tensor_axpby_kernel.cu".
  error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.7\\bin\\nvcc.exe' failed with exit code 4294967295
  error: subprocess-exited-with-error

  Running setup.py install for apex did not run successfully.
  exit code: 1

  See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: 'C:\Users\aleks\Miniconda3\envs\bloom\python.exe' -u -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize

  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)

  __file__ = %r
  sys.argv[0] = __file__

  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"

  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'T:\\YouTube_Code\\7_BLOOM\\apex\\setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' --cpp_ext --cuda_ext install --record 'C:\Users\aleks\AppData\Local\Temp\pip-record-12qb6t0s\install-record.txt' --single-version-externally-managed --compile --install-headers 'C:\Users\aleks\Miniconda3\envs\bloom\Include\apex'
  cwd: T:\YouTube_Code\7_BLOOM\apex\
  Running setup.py install for apex: finished with status 'error'
error: legacy-install-failure