NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.85k stars 310 forks source link

importlib.metadata.PackageNotFoundError: transformer-engine #1216

Open zmtttt opened 4 days ago

zmtttt commented 4 days ago

make importerro: importlib.metadata.PackageNotFoundError: transformer-engine. have you met the same problems? thanks!

ksivaman commented 4 days ago

Could you share how you're installing transformer-engine?

zmtttt commented 4 days ago

Could you share how you're installing transformer-engine?

the official method:pip install --upgrade git+https://github.com/NVIDIA/TransformerEngine.git@stable but Fail to build wheel for transformer-engine

ksivaman commented 4 days ago

Could you post the full build log?

zmtttt commented 4 days ago

Could you post the full build log? Building CMake extension transformer_engine Running command /data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/.eggs/cmake-3.30.4-py3.8-linux-x86_64.egg/cmake/data/bin/cmake -S /data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/transformer_engine/common -B /data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/build/cmake -DPython_EXECUTABLE=/opt/conda/bin/python -DPython_INCLUDE_DIR=/opt/conda/include/python3.8 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/build/lib.linux-x86_64-cpython-38 -DCMAKE_CUDA_ARCHITECTURES=70;80;89;90 -Dpybind11_DIR=/opt/conda/lib/python3.8/site-packages/pybind11/share/cmake/pybind11 -GNinja CMake Error at /data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/.eggs/cmake-3.30.4-py3.8-linux-x86_64.egg/cmake/data/share/cmake-3.30/Modules/Internal/CMakeCUDAFindToolkit.cmake:104 (message): Failed to find nvcc.

    Compiler requires the CUDA toolkit.  Please set the CUDAToolkit_ROOT
    variable.
  Call Stack (most recent call first):
    /data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/.eggs/cmake-3.30.4-py3.8-linux-x86_64.egg/cmake/data/share/cmake-3.30/Modules/CMakeDetermineCUDACompiler.cmake:85 (cmake_cuda_find_toolkit)
    CMakeLists.txt:23 (project)

  CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
  -- Configuring incomplete, errors occurred!
  Traceback (most recent call last):
    File "/data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/build_tools/build_ext.py", line 89, in _build_cmake
      subprocess.run(command, cwd=build_dir, check=True)
    File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['/data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/.eggs/cmake-3.30.4-py3.8-linux-x86_64.egg/cmake/data/bin/cmake', '-S', '/data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/transformer_engine/common', '-B', '/data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/build/cmake', '-DPython_EXECUTABLE=/opt/conda/bin/python', '-DPython_INCLUDE_DIR=/opt/conda/include/python3.8', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/build/lib.linux-x86_64-cpython-38', '-DCMAKE_CUDA_ARCHITECTURES=70;80;89;90', '-Dpybind11_DIR=/opt/conda/lib/python3.8/site-packages/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/setup.py", line 174, in <module>
      setuptools.setup(
    File "/opt/conda/lib/python3.8/site-packages/setuptools/__init__.py", line 87, in setup
      return distutils.core.setup(**attrs)
    File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
    File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/opt/conda/lib/python3.8/site-packages/setuptools/dist.py", line 1208, in run_command
      super().run_command(command)
    File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install.py", line 68, in run
      return orig.install.run(self)
    File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/command/install.py", line 698, in run
      self.run_command('build')
    File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/opt/conda/lib/python3.8/site-packages/setuptools/dist.py", line 1208, in run_command
      super().run_command(command)
    File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/command/build.py", line 132, in run
      self.run_command(cmd_name)
    File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/opt/conda/lib/python3.8/site-packages/setuptools/dist.py", line 1208, in run_command
      super().run_command(command)
    File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/build_tools/build_ext.py", line 119, in run
      ext._build_cmake(
    File "/data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/build_tools/build_ext.py", line 91, in _build_cmake
      raise RuntimeError(f"Error when running CMake: {e}")
  RuntimeError: Error when running CMake: Command '['/data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/.eggs/cmake-3.30.4-py3.8-linux-x86_64.egg/cmake/data/bin/cmake', '-S', '/data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/transformer_engine/common', '-B', '/data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/build/cmake', '-DPython_EXECUTABLE=/opt/conda/bin/python', '-DPython_INCLUDE_DIR=/opt/conda/include/python3.8', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/data/train_nfs/offload_megatron/megatron_0.8/zmt/Megatron-LM/TransformerEngine/build/lib.linux-x86_64-cpython-38', '-DCMAKE_CUDA_ARCHITECTURES=70;80;89;90', '-Dpybind11_DIR=/opt/conda/lib/python3.8/site-packages/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: legacy-install-failure

× Encountered error while trying to install package. ╰─> transformer-engine

note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure.

thanks!!!

timmoon10 commented 1 day ago