astral-sh / uv

An extremely fast Python package and project manager, written in Rust.
https://docs.astral.sh/uv
Apache License 2.0
26.82k stars 778 forks source link

Running out of memory with uv pip install #7004

Open atti92 opened 2 months ago

atti92 commented 2 months ago

We are using uv inside CI/CD, and we are constantly hitting Memory kills by the kubernetes executor. We don't really want to increase the memory requests to high because it's not needed outside of uv install and it's costly. We were trying to set concurrency limits, but that didn't really help.

The memory usage seems to increase to high levels, when we have many-many dependencies, and might also increase with the number of extra-indexes.

I'm primarily asking if there is a way to limit memory usage of uv?

charliermarsh commented 2 months ago

I think the best way to limit memory would be to limit the number of concurrent builds. Did that not work?

atti92 commented 2 months ago

Even setting all 3 concurrency options to 1 results in almost the same memory spike. Also I think we primarily use pre-built packages.

zanieb commented 4 weeks ago

Do you have more details on the memory consumption and a reproduction we could use?

atti92 commented 4 weeks ago

Hi, thanks for looking into this.

I will try to put together a publicly available repro, but we are using multiple internal indexes and that might make the problem worse. For now I've run a local install with a bunch of public+internal dependencies, while measuring max memory with /bin/time -v. These measurements don't seem to justify the kubernetes memory kills (much lower than the limits), but give a baseline comparison between installs.

Excluding internal dependencies:

Default configuration htop snippet during running: image

Command being timed: "uv pip install --index-strategy first-match --no-cache --reinstall -r requirements.txt"
    User time (seconds): 1.67
    System time (seconds): 2.33
    Percent of CPU this job got: 51%
    Maximum resident set size (kbytes): **115544**
    Minor (reclaiming a frame) page faults: 29839
    Voluntary context switches: 123439
    Involuntary context switches: 479
    File system outputs: 646184

After adding a secondary public pypi mirror index (anything should work) `--extra-index-url https://mirrors.sustech.edu.cn/pypi/web/simple

Command being timed: "uv pip install --no-cache --reinstall --extra-index-url https://mirrors.sustech.edu.cn/pypi/web/simple -r requirements.txt"
    User time (seconds): 3.33
    System time (seconds): 3.55
    Percent of CPU this job got: 39%
    Maximum resident set size (kbytes): **121448**
    Minor (reclaiming a frame) page faults: 52101
    Voluntary context switches: 150218
    Involuntary context switches: 937
    File system outputs: 670448

Adding --index-strategy:

Command being timed: "uv pip install --no-cache --reinstall --index-strategy unsafe-first-match --extra-index-url https://mirrors.sustech.edu.cn/pypi/web/simple -r requirements.txt"
    User time (seconds): 3.73
    System time (seconds): 3.86
    Percent of CPU this job got: 36%
    Maximum resident set size (kbytes): **169200**
    Minor (reclaiming a frame) page faults: 64064
    Voluntary context switches: 152854
    Involuntary context switches: 1171
    File system outputs: 738080

Adding concurrency limits to 1: (makes it really slow. This reduces the memory use a bit. image

export UV_CONCURRENT_DOWNLOADS=1 
export UV_CONCURRENT_BUILDS=1 
export UV_CONCURRENT_INSTALLS=1
Command being timed: "uv pip install --no-cache --reinstall --extra-index-url https://mirrors.sustech.edu.cn/pypi/web/simple -r requirements.txt"
    User time (seconds): 3.57
    System time (seconds): 3.49
    Percent of CPU this job got: 5%
    Maximum resident set size (kbytes): **108892**
    Minor (reclaiming a frame) page faults: 48392
    Voluntary context switches: 158458
    Involuntary context switches: 652
    File system outputs: 670448

Most of the memory usage seems to add up during the resolution phase. During download the memory usage creeps higher without sign of a reduction. Adding our local indexes and even more dependencies the memory usage goes up by another 100MB. using --index-strategy other than the default seems to increase memory usage with each extra index url added. Running the install locally results in a 280MB total resident memory, although this usually gets killed by kubernetes with a 1GB limit. (does it add up each thread individually?).

Used requirements.txt file for the public tests:

click>=7.0
tabulate>=0.8.9
pyyaml>=6.0
cryptography>=42.0.1
packaging>=21.3
python-keycloak~=4.3.0
azure-identity~=1.17.1
msgraph-sdk~=1.5.4
msal~=1.30.0
dataclasses_json>=0.5.7
semver>=2.13.0
urllib3>=2.0.7
rich-click>=1.7.0
durationpy>=0.7
inquirerpy==0.3.4
pytest>=6.2.5
pytest-cov
pytest-mock>=3.6.1
responses>=0.15.0
respx>=0.21.1
requests-mock>=1.9.3
twill
mkdocs>1.2.4
mkdocs-material
mkdocs-material-extensions
pymdown-extensions
mkdocs-click

This is our uv config file in CI/CD:

$ cat ${UV_CONFIG_FILE}
concurrent-builds = 1
concurrent-downloads = 2
concurrent-installs = 2
[pip]
python = "python3"
prerelease = "allow"
index-strategy = "unsafe-first-match"
link-mode = "copy"
extra-index-url = [
  "REDACTED",
  "REDACTED",
]

And also environment variables are set to those same settings (including UV_EXTRA_INDEX_URL)

zanieb commented 4 weeks ago

So from your screenshots it looks like we're consuming ~3 GB of virtual memory and when you limit concurrency it goes down to ~500 MB? I think we care more about the physical amounts though (RES), which look like 96 MB and 93 MB respectively. That doesn't seem particularly low, but I'm not sure it's unreasonable. In your failure cases, it looks like we're consuming around 100-150 MB?

using --index-strategy other than the default seems to increase memory usage with each extra index url added.

This makes sense, we need to store details about the available packages in more indexes.

I wonder if we should add a special limit to the prefetch job, maybe we'd check less package versions then. Or maybe we can free some versions from our mapping early?

(does it add up each thread individually?).

All the threads should be sharing memory.

atti92 commented 4 weeks ago

During local testing RES reached 280MB, while the same script gets killed in kubernetes around 50% of the time with 1GB limits. No idea why.

It would be nice if you could reduce the memory footprint or just add some optional hard limits to memory usage / dynamic free during runtime.