NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Apache License 2.0
1.81k stars 301 forks source link

stuck at building wheel #1077

Open neurosynapse opened 1 month ago

neurosynapse commented 1 month ago

pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable Defaulting to user installation because normal site-packages is not writeable Collecting git+https://github.com/NVIDIA/TransformerEngine.git@stable Cloning https://github.com/NVIDIA/TransformerEngine.git (to revision stable) to /tmp/pip-req-build-fa900tpa Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA/TransformerEngine.git /tmp/pip-req-build-fa900tpa Running command git checkout -b stable --track origin/stable Switched to a new branch 'stable' Branch 'stable' set up to track remote branch 'stable' from 'origin'. Resolved https://github.com/NVIDIA/TransformerEngine.git to commit 3ec998e96c82bc30247560ced6170c4221ca2b5a Running command git submodule update --init --recursive -q Preparing metadata (setup.py) ... done Requirement already satisfied: packaging in /usr/lib/python3/dist-packages (from transformer-engine==1.8.0+3ec998e) (21.3) Collecting pydantic Using cached pydantic-2.8.2-py3-none-any.whl (423 kB) Requirement already satisfied: typing-extensions>=4.6.1 in /home/rob/.local/lib/python3.10/site-packages (from pydantic->transformer-engine==1.8.0+3ec998e) (4.12.2) Collecting pydantic-core==2.20.1 Using cached pydantic_core-2.20.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB) Collecting annotated-types>=0.4.0 Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB) Building wheels for collected packages: transformer-engine Building wheel for transformer-engine (setup.py) ... |

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0

Ubuntu 22.04

Python 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:12:24) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

torch 2.4.0+cu121

neurosynapse commented 1 month ago

rtx 3090 ti

timmoon10 commented 1 month ago

We use Ninja to parallelize the build process and I suspect it's overwhelming your system resources. Can you try running with MAX_JOBS=1 in your environment?

1195343015 commented 1 month ago
          Hm, I'd expect most systems could handle building with `MAX_JOBS=1`. I wonder if we could get more clues if you build with verbose output (`pip install -v -v .`).

Originally posted by @timmoon10 in https://github.com/NVIDIA/TransformerEngine/issues/976#issuecomment-2274493866

It's useful for me ! And you should wait for more time.