Parallel build with limited resource

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Apache License 2.0

1.85k stars 309 forks source link

Description

Enforcing csrc build with Ninja for all frameworks.

The pyproject.toml is used to check and install required packages, thus all the found_xxx() functions were removed.

By default, ninja build takes all available threads (equivalent to make -j). One can specify the maximum number of involved threads by NVTE_MAX_BUILD_JOBS or MAX_JOBS env vars.

Type of change

[ ] Documentation change (change only to the documentation, either a fix or a new content)

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

[x] Infra/Build change

[ ] Code refractor

Checklist:

[x] I have read and followed the contributing guidelines

[x] The functionality is complete

[x] I have commented my code, particularly in hard-to-understand areas

[ ] I have made corresponding changes to the documentation

[x] My changes generate no new warnings

[ ] I have added tests that prove my fix is effective or that my feature works

[ ] New and existing unit tests pass locally with my changes

We abandoned this PR based on the following chain of logic:

We wanted to make Ninja a build-time dependency so we can have consistent build parallelization.
Setuptools has deprecated the setup_requires kwarg to setuptools.setup (see deprecated keywords in the list of setuptools keywords). The recommended approach is to create a pyproject.toml with a [build-system] table (see this PyPA guide).
When Pip detects a pyproject.toml, it uses build isolation (see Pip docs). That is, it builds within a temporary virtual environment with only build-time dependencies. As far as I can tell, this can only be disabled by the user running pip install --no-build-isolation. This is a deliberate design by the Python developers to enforce their vision of package hygiene.
However, building PyTorch and Paddle extensions requires access to Setuptools wrappers (see torch.utils.cpp_extension). It's also important for Transformer Engine to be framework-agnostic, so our set of build-time dependencies is dynamic.
We must either ask users to change their build workflows, try to circumvent Pip's build isolation, or find some way to specify dynamic build-time dependencies.

We're not the first to note how build isolation is poorly-suited for the ML ecosystem (see https://github.com/astral-sh/uv/issues/1715). We should keep this in mind for the future in case we need to modernize the build process and add a pyproject.toml. Users may want to preemptively run with pip install --no-build-isolation so that we don't break their build workflows.

Fow now, the much simpler approach is to modify our build process to handle either Ninja or make. See https://github.com/NVIDIA/TransformerEngine/pull/987.

NVIDIA / TransformerEngine