astral-sh / uv

An extremely fast Python package and project manager, written in Rust.
https://docs.astral.sh/uv
Apache License 2.0
20.05k stars 597 forks source link

Race condition when two libraries update the same __init__.py #6568

Open ramarnat opened 3 weeks ago

ramarnat commented 3 weeks ago

This has had me pulling my hair out.

If pydantic is enabled in the requirements.txt, pyjwt fails with

#12 7.705 Traceback (most recent call last):
#12 7.705   File "<string>", line 1, in <module>
#12 7.706 ImportError: cannot import name 'PyJWKClient' from 'jwt' (/opt/venv/lib/python3.12/site-packages/jwt/__init__.py)

Looks like the init.py doesnt get updated with PyJWKClient in the failed runs, but no indication as to why.

Cant make it fail deterministically but doing docker system prune, it will fail every other (?) time.

Here is the Dockerfile:

FROM python:3.12.3 AS build

ENV PATH="/opt/venv/bin:$PATH"
ENV VIRTUAL_ENV="/opt/venv"

# Python runtime dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    make \
    g++ \
    gfortran \
    cmake && \
    python -m venv /opt/venv 

COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

WORKDIR /app
COPY requirements.txt  ./

RUN echo 1 && \
    uv pip -v --no-cache sync requirements.txt &&\
    python -c "from jwt import PyJWKClient" 

Here is the requirements.txt:

# This file was autogenerated by uv via the following command:
#    uv pip compile requirements.in pyproject.toml
annotated-types==0.7.0
    # via pydantic
anyio==3.7.1
    # via
    #   fastapi
    #   starlette
certifi==2022.12.7
    # via requests
cffi==1.17.0
    # via cryptography
charset-normalizer==2.1.1
    # via requests
cryptography==43.0.0
    # via jwt
# fastapi==0.105.0
    # via -r requirements.in
google-search-results==2.4.2
    # via locus-ferb (pyproject.toml)
idna==3.4
    # via
    #   anyio
    #   requests
jwt==1.3.1
    # via locus-ferb (pyproject.toml)
pycparser==2.22
    # via cffi
pydantic==2.8.2
    # via fastapi
pydantic-core==2.20.1
    # via pydantic
pyjwt==2.9.0
    # via locus-ferb (pyproject.toml)
requests==2.28.1
    # via
    #   google-search-results
    #   serpapi
serpapi==0.1.5
    # via locus-ferb (pyproject.toml)
sniffio==1.3.1
    # via anyio
# starlette==0.27.0
    # via fastapi
typing-extensions==4.9.0
    # via
    #   fastapi
    #   pydantic
    #   pydantic-core
urllib3==1.26.13
    # via requests

Attached are the two logs, running the same command one after the other: docker system prune && docker build --ssh default --progress plain -f Dockerfile-fail . build-working.log build-failing.log

zanieb commented 3 weeks ago

Well, in the both examples there are things like:

#12 7.601 DEBUG File already exists (subsequent attempt), overwriting: /opt/venv/lib/python3.12/site-packages/jwt/__init__.py

It looks like there's a conflict between two packages and you're seeing a race condition where it succeeds or fails depending on which one overwrites the file last.

ramarnat commented 3 weeks ago

That makes sense, pyjwt adds functionality to jwt, so its update to the init.py should probably come after. Is there a way to force a dependency? If it were possible to do, would that fix the race condition?

I'll check to see if we can remove jwt, and only keep pyjwt. But I see the same issue for another library that seems to have the same issue, in that case I know I need both libraries.

Or maybe I have to do a sync, and then separately install the 2nd library with uv pip install?

The race condition doesn't happen in pip, because of its slowness.

zanieb commented 3 weeks ago

I see. I don't know if it's feasible for us to unpack packages in a defined order like that, @charliermarsh would know better.

It seems like pretty bad behavior for a package to rely on patching an existing one by overwriting its files. I think in this case, you'd be best off running two separate install operations — one to prepare the environment, and the next to patch the package.

ramarnat commented 3 weeks ago

Could the warning be surfaced, so it's clearer that one is clobbering the other?

Agreed on the package behavior, but such is life with dependence on third party libs...

And I take back that it may not happen for us in pip, but I see articles about the import not working. https://github.com/jpadilla/pyjwt/issues/685

https://stackoverflow.com/questions/68912059/module-not-found-error-when-using-google-search-api-serpapi

ramarnat commented 3 weeks ago

Also, thank you @zanieb for identifying the issue so quickly.

ramarnat commented 3 weeks ago

After looking at what my options around working around this while sticking with "sync" type workflow, and was hoping to use uv lock and then uv sync --no-install-package pyjwt instead of uv pip sync but.... The project is marked as unmanaged

charliermarsh commented 3 weeks ago

Do you have unmanaged = true set anywhere?

ramarnat commented 3 weeks ago

I do (I set managed=false in the pyproject.toml) - because our existing docker images use /opt/venv, and was trying to avoid changing that. But I think I will move stuff into /app/.venv so can use uv sync instead of uv pip.

ramarnat commented 3 weeks ago

Ok I have it working with this Dockerfile and uv.lock

ROM python:3.12.3 AS build

ENV PATH="/app/.venv/bin:$PATH"
ENV VIRTUAL_ENV="/app/.venv"

# Python runtime dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    make \
    g++ \
    gfortran \
    cmake 

COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

WORKDIR /app
COPY uv.lock pyproject.toml ./

RUN RUST_LOG=trace uv -v --no-cache sync --no-install-package pyjwt --no-install-package google-search-results &&\
    RUST_LOG=trace uv -v --no-cache sync &&\
    python -c "from jwt import PyJWKClient" 
ramarnat commented 3 weeks ago

I would like to request a feature for future uv noobs, that warns when a library clobbers another one, I spent a fair bit of time going thru a number of permutations thinking there was something going on in my environment/cache/docker setup.

zanieb commented 2 weeks ago

@konstin is it feasible for us to use warn_user_once when installing a package clobbers files?

charliermarsh commented 2 weeks ago

I'm honestly not sure if we should, because I think there are packages that do this somewhat intentionally.

konstin commented 2 weeks ago

The cases i looked where __init__.py happened intentionally it was the __init__.py in both packages. We should show a warning when they are not the same, in which case we know there is a race condition. (We have to read the existing file for that, so we'll need to use Locks after io::ErrorKind::AlreadyExists to make sure the read is non-raced)

zanieb commented 2 weeks ago

I think this one does do it intentionally but if it's installed in the same invocation as its target package that's a big problem since we don't do it in a deterministic order?

ramarnat commented 2 weeks ago

could there be an option - uv add <package> --depends_on <main_package> which would update the uv.lock to install that package after the main_package?

zanieb commented 2 weeks ago

There could be but it's a pretty niche use-case and controlling the install order manually seems weird. I feel like this is a problem with the design of these packages, not a common use-case we should add configuration options for.