astral-sh / uv

An extremely fast Python package and project manager, written in Rust.
https://docs.astral.sh/uv
Apache License 2.0
19.35k stars 571 forks source link

[perf] differences in pip vs uv when installing package from src #3802

Open ryxli opened 3 months ago

ryxli commented 3 months ago

Would like some help or pointers with identifying the reason for the performance different between pip and uv pip install for a package (https://github.com/NVIDIA/TransformerEngine)

Reproduce steps:

>uv --version
uv 0.1.44

>git clone https://github.com/NVIDIA/TransformerEngine.git
>cd TransformerEngine

With regular pip:

> time NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi   pip install --no-build-isolation --no-deps -e .
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Obtaining file:///workspace/TransformerEngine
  Preparing metadata (setup.py) ... done
Installing collected packages: transformer-engine
  Attempting uninstall: transformer-engine
    Found existing installation: transformer-engine 1.8.0.dev0+d705f7f
    Uninstalling transformer-engine-1.8.0.dev0+d705f7f:
      Successfully uninstalled transformer-engine-1.8.0.dev0+d705f7f
  Running setup.py develop for transformer-engine
Successfully installed transformer-engine-1.8.0.dev0+d705f7f
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

real    0m12.524s
user    0m10.516s
sys     0m12.773s

With uv pip:

> time NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi uv p
ip install --system --no-build-isolation --no-deps -e .
   Built file:///workspace/TransformerEngine                                                                                                                         Built 1 editable in 1m 09s
Resolved 1 package in 10ms
Installed 1 package in 1ms
 - transformer-engine==1.8.0.dev0+d705f7f
 + transformer-engine==1.8.0.dev0+d705f7f (from file:///workspace/TransformerEngine)

real    1m9.980s
user    16m6.687s
sys     2m12.868s

With --verbose enabled, the command seems to get stuck at Calling setuptools.build_meta:__legacy__.build_editable for a long while:

DEBUG Starting interpreter discovery for default Python
DEBUG Cached interpreter info for Python 3.10.12, skipping probing: /usr/bin/python3
DEBUG Using Python 3.10.12 environment at /usr/bin/python3
DEBUG Trying to lock if free: /tmp/uv-08d95a7330542a29.lock
DEBUG At least one requirement is not satisfied: file:///workspace/TransformerEngine
DEBUG Using registry request timeout of 30s
DEBUG Building (editable) file:///workspace/TransformerEngine
DEBUG Calling `setuptools.build_meta:__legacy__.build_editable("/root/.cache/uv/.tmpKghDdN/.tmphfHmSb", {}, None)`
.....
charliermarsh commented 3 months ago

Hard for me to test this because it requires CUDA it seems?

charliermarsh commented 3 months ago

But setuptools.build_meta:__legacy__.build_editable is just the build hook to build the editable -- it's not uv code, but Python code following the standards.

zanieb commented 3 months ago

It seems like pip isn't performing a build? Do their verbose logs have more information?

charliermarsh commented 3 months ago

I think pip actually doesn't use PEP 517 when doing editables, or something like that.

ryxli commented 3 months ago

Hard for me to test this because it requires CUDA it seems?

Unfortunately this package requires cuda, although the issue related to the install time does not seem related from what I can tell

It seems like pip isn't performing a build? Do their verbose logs have more information?

I did not include the whole logs as on first install most of the time is spent building cpp extensions via cmake. The cmake build is done incrementally, so it doesn't affect the tests above which is when I noticed this significant time difference just comparing uv and pip install.

The uv pip install just hangs on this line for a while, after which the install seems just as fast as regular pip.

DEBUG Calling `setuptools.build_meta:__legacy__.build_editable("/root/.cache/uv/.tmpKghDdN/.tmphfHmSb", {}, None)`

It's possible there may be something else going on under the hood, as cmake logs are not exposed via uv pip install (https://github.com/astral-sh/uv/issues/1567). But for regular pip, I can get the logs and it seems relatively fast, so unsure what the issue is (10-15 seconds vs 1-2 minutes)

samypr100 commented 3 months ago

Is there a particular setup you have? Can you try on a fully clean environment, e.g. docker image?

ryxli commented 3 months ago

This is while trying to build a docker image

samypr100 commented 3 months ago

Which docker image base were you using?

ryxli commented 3 months ago

Nvidia pytorch image

https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch

Using 24.04-py3

samypr100 commented 3 months ago

Thanks, I was attempting it earlier in a nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 image and was getting relatively same build times across both. I will try using nvcr.io/nvidia/pytorch:24.04-py3, but from a quick glance it seems transformer-engine is already pre-built in it which could explain some of the speed differences.

ryxli commented 3 months ago

This is after uninstalling transformer_engine n in the base image, and then installing it from src