ROCm / TransformerEngine

Other
7 stars 1 forks source link

[Issue]: ROCm TE Installation Error: no member named 'getCurrentHIPStreamMasqueradingAsCUDA' in namespace 'c10::hip' #71

Open OrenLeung opened 1 day ago

OrenLeung commented 1 day ago

Problem Description

I am running into an error no member named 'getCurrentHIPStreamMasqueradingAsCUDA' in namespace 'c10::hip' when trying to install ROCm/TransformerEngine following the instructions in the README. Do you have any tips on how to resolve this error?

Reprod

FROM rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_release_2.3.0

RUN apt install nano

RUN pip3 uninstall -y torch

RUN pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/rocm6.2

RUN pip3 install pybind11

WORKDIR /workspace/

# Unlike Nvidia NGC Pytorch image, ROCm Pytorch does not have Transformer Engine Installed
# So we need to install from source
RUN git clone --recursive https://github.com/ROCm/TransformerEngine.git
ENV NVTE_FRAMEWORK=pytorch
ENV PYTORCH_ROCM_ARCH=gfx942

RUN cd TransformerEngine && pip install .

Error Trace

      /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/activation.hip:255:58: error: no member named 'getCurrentHIPStreamMasqueradingAsCUDA' in namespace 'c10::hip'
        255 |   nvte_srelu(input_cu.data(), output_cu.data(), at::hip::getCurrentHIPStreamMasqueradingAsCUDA());
            |                                                 ~~~~~~~~~^
      /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/activation.hip:274:75: error: no member named 'getCurrentHIPStreamMasqueradingAsCUDA' in namespace 'c10::hip'
        274 |   nvte_dsrelu(grad_cu.data(), input_cu.data(), output_cu.data(), at::hip::getCurrentHIPStreamMasqueradingAsCUDA());
            |                                                                  ~~~~~~~~~^
      14 errors generated when compiling for gfx942.
      failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx942  -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/transformer_engine/common -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c -x hip /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/activation.hip -o "/workspace/TransformerEngine/build/temp.linux-x86_64-cpython-310/transformer_engine/pytorch/csrc/extensions/activation.o" -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_BFLOAT16_OPERATORS__ -U__HIP_NO_BFLOAT16_CONVERSIONS__ -U__HIP_NO_BFLOAT162_OPERATORS__ -U__HIP_NO_BFLOAT162_CONVERSIONS__ -parallel-jobs=4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=transformer_engine_torch -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc -std=c++17
      [14/17] c++ -MMD -MF /workspace/TransformerEngine/build/temp.linux-x86_64-cpython-310/transformer_engine/pytorch/csrc/ts_fp8_op_hip.o.d -pthread -B /opt/conda/envs/py_3.10/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -fPIC -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/transformer_engine/common -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/ts_fp8_op_hip.cpp -o /workspace/TransformerEngine/build/temp.linux-x86_64-cpython-310/transformer_engine/pytorch/csrc/ts_fp8_op_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -O3 -fvisibility=hidden -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=transformer_engine_torch -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
      [15/17] c++ -MMD -MF /workspace/TransformerEngine/build/temp.linux-x86_64-cpython-310/transformer_engine/pytorch/csrc/extensions/pybind_hip.o.d -pthread -B /opt/conda/envs/py_3.10/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -fPIC -O2 -isystem /opt/conda/envs/py_3.10/include -fPIC -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/transformer_engine/common -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind_hip.cpp -o /workspace/TransformerEngine/build/temp.linux-x86_64-cpython-310/transformer_engine/pytorch/csrc/extensions/pybind_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -O3 -fvisibility=hidden -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=transformer_engine_torch -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
      [16/17] /opt/rocm/bin/hipcc  -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/transformer_engine/common -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/misc.hip -o /workspace/TransformerEngine/build/temp.linux-x86_64-cpython-310/transformer_engine/pytorch/csrc/extensions/misc.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_BFLOAT16_OPERATORS__ -U__HIP_NO_BFLOAT16_CONVERSIONS__ -U__HIP_NO_BFLOAT162_OPERATORS__ -U__HIP_NO_BFLOAT162_CONVERSIONS__ -parallel-jobs=4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=transformer_engine_torch -D_GLIBCXX_USE_CXX11_ABI=0 --offload-arch=gfx942 -fno-gpu-rdc -std=c++17
      [17/17] /opt/rocm/bin/hipcc  -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/transformer_engine/common -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/include/THH -I/opt/rocm/include -I/opt/conda/envs/py_3.10/include/python3.10 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/common.hip -o /workspace/TransformerEngine/build/temp.linux-x86_64-cpython-310/transformer_engine/pytorch/csrc/common.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_BFLOAT16_OPERATORS__ -U__HIP_NO_BFLOAT16_CONVERSIONS__ -U__HIP_NO_BFLOAT162_OPERATORS__ -U__HIP_NO_BFLOAT162_CONVERSIONS__ -parallel-jobs=4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=transformer_engine_torch -D_GLIBCXX_USE_CXX11_ABI=0 --offload-arch=gfx942 -fno-gpu-rdc -std=c++17
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2147, in _run_ninja_build
          subprocess.run(
        File "/opt/conda/envs/py_3.10/lib/python3.10/subprocess.py", line 526, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '32']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/workspace/TransformerEngine/setup.py", line 135, in <module>
          setuptools.setup(
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
          return distutils.core.setup(**attrs)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
          return run_commands(dist)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
          dist.run_commands()
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 368, in run
          self.run_command("build")
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run
          self.run_command(cmd_name)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/workspace/TransformerEngine/build_tools/build_ext.py", line 117, in run
          super().run()
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 91, in run
          _build_ext.run(self)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
          self.build_extensions()
        File "/workspace/TransformerEngine/build_tools/build_ext.py", line 246, in build_extensions
          super().build_extensions()
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 899, in build_extensions
          build_ext.build_extensions(self)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 479, in build_extensions
          self._build_extensions_serial()
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 505, in _build_extensions_serial
          self.build_extension(ext)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 252, in build_extension
          _build_ext.build_extension(self, ext)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/Cython/Distutils/build_ext.py", line 135, in build_extension
          super(build_ext, self).build_extension(ext)
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 560, in build_extension
          objects = self.compiler.compile(
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 712, in unix_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1827, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2163, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.

Operating System

Ubuntu

CPU

AMD CPU

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.2.0

OrenLeung commented 1 day ago

Seems like this issue is fixed by updating to the latest nightly as it contains https://github.com/pytorch/pytorch/commit/7e8dace0de6bb589e4fd8f37e8642819b80c0baa which reverts https://github.com/pytorch/pytorch/pull/137157

https://github.com/pytorch/pytorch/pull/137157 breaks ROCm/TransformerEngine & my whole fp8 training codebase on MI300X as it removes MasqueradingAsCUDA which it seems like ROCm/TransformerEngine currently depends on