Pytorch downgrades to a lower version

thepowerfuldeez commented 2 months ago

Hi! Is there functionality to ignore dependencies if they are already exist in the system python? I am using nvcr.io/nvidia/pytorch:24.08-py3 docker image which has pytorch==2.5.0a0+872d972e41.nv24.08 installed

I followed guide from here to enable system python via UV_SYSTEM_PYTHON=1 and I have pyproject.toml that includes packages that I've put from requirements.txt and made uv lock.

But now I see that it's been resolved with torch==2.4.0 and when I run uv sync my pytorch gets downgraded to 2.4.0 which is unexpected. I have tried to manually remove all pytorch dependencies from lockfile and run uv sync from that state, but for some reason now it suceeds with installing only 4 of 126 packages (and they are just file packages from file://... defined in pyproject.toml) I've tried using uv sync --frozen uv sync --locked and tried creating uv sync with different variations but haven't suceeded yet. To be frank pip install -r requirements.txt would also downgrade my torch unless I make pip freeze > requirements.txt and install with -r --no-deps but this is not favorable.

I generally lack understanding of such intricacies and I've noticed that other people stumble with flash-attn installation or CUDA extensions. I wonder what's the best way to handle such conflicts? Thank you very much for modern replacement of pip!

thepowerfuldeez commented 2 months ago

Example of downgrading torch (and installing cuda extensions which are not needed)


> uv sync
Resolved 127 packages in 567ms
⠧ Preparing packages... (5/13)
nvidia-cufft-cu12 ------------------------------ 58.53 MB/121.64 MB
nvidia-cusolver-cu12 ------------------------------ 58.41 MB/124.16 MB
nvidia-nccl-cu12 ------------------------------ 58.49 MB/176.25 MB
nvidia-cusparse-cu12 ------------------------------ 58.55 MB/195.96 MB
nvidia-cublas-cu12 ------------------------------ 58.29 MB/410.59 MB
nvidia-cudnn-cu12 ------------------------------ 58.35 MB/664.75 MB
torch      ------------------------------ 33.03 MB/797.23 MB
^C```

charliermarsh commented 2 months ago

This doesn't work today with uv sync or uv lock -- it won't respect existing environments. The lower-level uv pip APIs do, though. You can run uv pip install with an active virtual environment, and it will retain existing versions of PyTorch, if they're already installed.

thepowerfuldeez commented 2 months ago

@charliermarsh how do I respect pyproject.toml and use project setup features? I thought of switching to uv as a project management tool (similar to poetry or rye if that matter). Will uv pip install -r pyproject.toml work? What's the proposed approach for project management for ML workflows? This is highly requested imo, would glad to have clear guide :)

charliermarsh commented 2 months ago

Yeah uv pip install -r pyproject.toml should work just fine. In general, our goal is for folks to use the "higher-level" project APIs (uv lock, uv sync, etc.). But in this case, if you're using an nvidia image that has packages pre-installed, it won't play correctly with uv lock and uv sync. So if you want to build atop that base environment, you'll need to use the "lower-level" uv pip APIs.

E.g., activate that virtual environment, then run uv pip install -r pyproject.toml.

thepowerfuldeez commented 2 months ago

@charliermarsh This doesn't work I have tried

> uv venv --python-preference only-system --system-site-packages
source .venv/bin/activate
uv pip install -r pyproject.toml

It still installs torch==2.4.0 even though it's not in the dependencies of pyproject.toml

I verified that inside this venv torch is installed

thepowerfuldeez commented 2 months ago

@charliermarsh I have tried to set

[tool.uv]
# Always install torch 2.5, regardless of whether transitive dependencies request
# a different version.
override-dependencies = ["torch==2.5.0a0+872d972e41.nv24.08"]

but it can't be resolved

× No solution found when resolving dependencies:
  ╰─▶ Because there is no version of torch==2.5.0a0+872d972e41.nv24.8 and accelerate==0.33.0 depends on torch==2.5.0a0+872d972e41.nv24.8, we can conclude that accelerate==0.33.0 cannot be used.

charliermarsh commented 2 months ago

@thepowerfuldeez -- If you're trying to use packages from the system Python, you'll need to use uv pip install --system -r pyproject.toml, and avoid creating a virtual environment at all. Can you try that?

thepowerfuldeez commented 2 months ago

@charliermarsh Thank you so much! It works now! So the solution is not using lockfile for now, right?

charliermarsh commented 2 months ago

Unfortunately yes. We'll need to think on how to solve this properly.

charliermarsh commented 2 months ago

I'm gonna leave the issue open but might tweak the title and add some more details on the underlying problem, if that's ok.

awoimbee commented 2 months ago

I had the same issue with poetry some time ago: https://github.com/python-poetry/poetry/issues/6035 (resolved by https://github.com/python-poetry/poetry/pull/8359).

Note uv venv --system-site-packages exists since #2101, but "we won't take the system site packages into account in subsequent commands".

zanieb commented 3 weeks ago

@charliermarsh it seems like there might be a real issue to track here?

charliermarsh commented 3 weeks ago

Yeah. The system packages aren't taken into account.

astral-sh / uv

Pytorch downgrades to a lower version #6880