bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.31k stars 213 forks source link

Installing Apex on Windows #342

Open gordicaleksa opened 2 years ago

gordicaleksa commented 2 years ago

Hi folks!

Did anyone encounter this error when installing Apex on Windows?

Trying to create a YouTube video covering this codebase but this one is blocking me.

I know this is probably better suited for apex folks - but in case anyone running this repo encountered it or knows anything I'd super appreciate any help!

Pasting it here as well:

C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.24.28314/include\type_traits(1061): error: static assertion failed with "You've instantiated std::aligned_storage<Len, Align> with an extended alignment (in other words, Align > alignof(max_align_t)). Before VS 2017 15.8, the member "type" would non-conformingly have an alignment of only alignof(max_align_t). VS 2017 15.8 was fixed to handle this correctly, but the fix inherently changes layout and breaks binary compatibility (*only* for uses of aligned_storage with extended alignments). Please define either (1) _ENABLE_EXTENDED_ALIGNED_STORAGE to acknowledge that you understand this message and that you actually want a type with an extended alignment, or (2) _DISABLE_EXTENDED_ALIGNED_STORAGE to silence this message and get the old non-conforming behavior."
            detected during:
              instantiation of class "std::_Aligned<_Len, _Align, double, false> [with _Len=16ULL, _Align=16ULL]"
  (1079): here
              instantiation of class "std::_Aligned<_Len, _Align, int, false> [with _Len=16ULL, _Align=16ULL]"
  (1084): here
              instantiation of class "std::_Aligned<_Len, _Align, short, false> [with _Len=16ULL, _Align=16ULL]"
  (1089): here
              instantiation of class "std::_Aligned<_Len, _Align, char, false> [with _Len=16ULL, _Align=16ULL]"
  (1094): here
              instantiation of class "std::aligned_storage<_Len, _Align> [with _Len=16ULL, _Align=16ULL]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_axpby_kernel.cu(23): here
              instantiation of "void load_store(T *, T *, int, int) [with T=float]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_axpby_kernel.cu(68): here
              instantiation of "void AxpbyFunctor<x_t, y_t, out_t>::operator()(int, volatile int *, TensorListMetadata<3> &, float, float, int) [with x_t=float, y_t=float, out_t=float]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_apply.cuh(38): here
              instantiation of "void multi_tensor_apply_kernel(int, volatile int *, T, U, ArgTypes...) [with T=TensorListMetadata<3>, U=AxpbyFunctor<float, float, float>, ArgTypes=<float, float, int>]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_apply.cuh(109): here
              instantiation of "void multi_tensor_apply<depth,T,ArgTypes...>(int, int, const at::Tensor &, const std::vector<std::vector<at::Tensor, std::allocator<at::Tensor>>, std::allocator<std::vector<at::Tensor, std::allocator<at::Tensor>>>> &, T, ArgTypes...) [with depth=3, T=AxpbyFunctor<float, float, float>, ArgTypes=<float, float, int>]"
  T:\YouTube_Code\7_BLOOM\apex\csrc\multi_tensor_axpby_kernel.cu(141): here

  1 error detected in the compilation of "csrc/multi_tensor_axpby_kernel.cu".
  error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.7\\bin\\nvcc.exe' failed with exit code 4294967295
  error: subprocess-exited-with-error

  Running setup.py install for apex did not run successfully.
  exit code: 1

  See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: 'C:\Users\aleks\Miniconda3\envs\bloom\python.exe' -u -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize

  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)

  __file__ = %r
  sys.argv[0] = __file__

  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"

  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'T:\\YouTube_Code\\7_BLOOM\\apex\\setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' --cpp_ext --cuda_ext install --record 'C:\Users\aleks\AppData\Local\Temp\pip-record-12qb6t0s\install-record.txt' --single-version-externally-managed --compile --install-headers 'C:\Users\aleks\Miniconda3\envs\bloom\Include\apex'
  cwd: T:\YouTube_Code\7_BLOOM\apex\
  Running setup.py install for apex: finished with status 'error'
error: legacy-install-failure

Encountered error while trying to install package.

apex

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
mayank31398 commented 2 years ago

@gordicaleksa , I think everyone is using Linux here. Also, this is not the correct place to ask this. Please create an issue here