NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.81k stars 301 forks source link

stuck at building wheel #1077

Open neurosynapse opened 1 month ago

neurosynapse commented 1 month ago

pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable Defaulting to user installation because normal site-packages is not writeable Collecting git+https://github.com/NVIDIA/TransformerEngine.git@stable Cloning https://github.com/NVIDIA/TransformerEngine.git (to revision stable) to /tmp/pip-req-build-fa900tpa Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA/TransformerEngine.git /tmp/pip-req-build-fa900tpa Running command git checkout -b stable --track origin/stable Switched to a new branch 'stable' Branch 'stable' set up to track remote branch 'stable' from 'origin'. Resolved https://github.com/NVIDIA/TransformerEngine.git to commit 3ec998e96c82bc30247560ced6170c4221ca2b5a Running command git submodule update --init --recursive -q Preparing metadata (setup.py) ... done Requirement already satisfied: packaging in /usr/lib/python3/dist-packages (from transformer-engine==1.8.0+3ec998e) (21.3) Collecting pydantic Using cached pydantic-2.8.2-py3-none-any.whl (423 kB) Requirement already satisfied: typing-extensions>=4.6.1 in /home/rob/.local/lib/python3.10/site-packages (from pydantic->transformer-engine==1.8.0+3ec998e) (4.12.2) Collecting pydantic-core==2.20.1 Using cached pydantic_core-2.20.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB) Collecting annotated-types>=0.4.0 Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB) Building wheels for collected packages: transformer-engine Building wheel for transformer-engine (setup.py) ... |

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0

Ubuntu 22.04

Python 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:12:24) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

torch 2.4.0+cu121

neurosynapse commented 1 month ago

rtx 3090 ti

timmoon10 commented 1 month ago

We use Ninja to parallelize the build process and I suspect it's overwhelming your system resources. Can you try running with MAX_JOBS=1 in your environment?

1195343015 commented 1 month ago
          Hm, I'd expect most systems could handle building with `MAX_JOBS=1`. I wonder if we could get more clues if you build with verbose output (`pip install -v -v .`).

Originally posted by @timmoon10 in https://github.com/NVIDIA/TransformerEngine/issues/976#issuecomment-2274493866

It's useful for me ! And you should wait for more time.