astral-sh / uv

An extremely fast Python package and project manager, written in Rust.
https://docs.astral.sh/uv
Apache License 2.0
24.32k stars 705 forks source link

[feat] uv publish --skip-existing #7917

Open chassing opened 3 weeks ago

chassing commented 3 weeks ago

It would be great if uv publish had a --skip-existing option that ignores errors from files already on PyPI, similar to poetry's and twine's --skip-existing options.

konstin commented 3 weeks ago

Do you have a specific case that fails? uv publish already ignores it if you try to upload the same file again.

chassing commented 3 weeks ago

Mmmmhhh strange, I got this error:

10:57:30 uv build
10:57:30 Building source distribution...
10:57:30 Building wheel from source distribution...
10:57:30 Successfully built dist/rh_aws_saml_login-0.3.4.tar.gz and dist/rh_aws_saml_login-0.3.4-py3-****-any.whl
10:57:30 uv publish
10:57:30 warning: `uv publish` is experimental and may change without warning
10:57:30 Publishing 2 files https://upload.pypi.org/legacy/
10:57:30 Uploading rh_aws_saml_login-0.3.4-py3-****-any.whl (3.7KiB)
10:57:30 error: Failed to publish `dist/rh_aws_saml_login-0.3.4-py3-****-any.whl` to https://upload.pypi.org/legacy/
10:57:30   Caused by: Upload failed with status code 400 Bad Request: 400 File already exists ('rh_aws_saml_login-0.3.4-py3-****-any.whl', with blake2_256 hash '6c36e28fc15545d4e3d10a0c68e9d476afdb6ae2facdbb846a10098516575d0d'). See https://pypi.org/help/#file-name-reuse for more information.
10:57:30 make: *** [Makefile:39: pypi] Error 2
chassing commented 3 weeks ago

TBH, it's not exactly the same file, it's the lastest master, but it's the same version.

konstin commented 3 weeks ago

Could you give some more context on why you're doing the second upload? In the final uploaded version, would the source distribution and the wheel be from different commits?

chassing commented 3 weeks ago

I'm sorry. Our development and release flow is the following: Users create PRs, and each PR is merged into the main branch. The main branch is automatically built and published to PyPI. As long nobody bumps the version number in pyproject.toml, we don't want to publish (release) a new version. We used poetry in the past, and the --skip-existing option did the job.

TiansuYu commented 3 weeks ago

Do you have a specific case that fails? uv publish already ignores it if you try to upload the same file aga

No it doesn't. UV will try to upload all pre-existing ones.

e.g. if I have published version 1.0.0, now wants to build and publish 1.0.1 locally, it it will try to upload 1.0.0 as well. I have to manually remove the older versions to make it work.

konstin commented 3 weeks ago

e.g. if I have published version 1.0.0, now wants to build and publish 1.0.1 locally, it it will try to upload 1.0.0 as well. I have to manually remove the older versions to make it work.

Are these 1.0.0 files the same that are on the index, and when uploading, do you get a "already exists but ok" message or do you get an error?

davidszotten commented 3 weeks ago

for the ci workflow @chassing mentioned (which is similar to what we use) the files aren't necessarily identical. they point is to use "this file already exists in the index" as a shorthand for "version hasn't changed, no need to upload" (by treating that specific error as being ok). i ported the twine approach to poetry, it's based on some heuristics https://github.com/pypa/twine/blob/ae71822a3cb0478d0f6a0cccb65d6f8e6275ece5/twine/commands/upload.py#L58-L72 so a bit hacky, but it works very well in practice

TiansuYu commented 3 weeks ago

So I can verify that if the file is identical, the uv publish will pass, but will break if the code changes. And there is no info on "already exists but ok"

Screenshot 2024-10-04 at 17 21 48

uv 0.4.18 (7b55e9790 2024-10-01)

charliermarsh commented 2 weeks ago

@konstin -- Where did we land on this?

zanieb commented 2 weeks ago

It seems problematic to try to upload different content and expect it to be treated the same way as uploading an identical file? It also seems reasonable to opt-in to ignoring failures in that case? Unless we want to add a command to check if a matching version exists or something? But that exposes you to TOCTOU bugs.

charliermarsh commented 2 weeks ago

Yeah agree, which would lead us to adding a skip existing flag IIUC.

davidszotten commented 2 weeks ago

(if you decide to accept the feature request i'd be interested in having a go at implementing it (with some pointers))

konstin commented 2 weeks ago

With pypi today, a new release is created once you upload the first file for that release. Uploading additional files with the same package name and version number but a different filename adds them to the existing release. That is suboptimal: A release is published once a single file exists even if the wheels the users want are still missing, and it means that if something in your release process fails, you have a release with half the files. The pypi admins are also aware that this is undesirable, and there is work going on to fix this (https://discuss.python.org/t/pep-694-upload-2-0-api-for-python-package-repositories/16879).


Separately, unfortunately it seems that different implementations differ in how they react to reuploading files

The current skip existing behavior was based on PyPI’s behavior, while copying twine response behavior for compatibility with alternative indexes (the ones @davidszotten mentioned) as default without parameters (misunderstanding how those behave for non-pypi), resulting in a strange mix of behaviours.


Skip existing packages in its basic form is necessary: If a CI publish task fails mid-way uploading files, the user should be able to restart the whole workflow, with only new files being reuploaded.

Preferably, we’d error when we try to upload files that have the same name when uploading, but different content, otherwise the following scenario could: You build foo 1.2.3 into foo-1.2.3.tar.gz and foo-1.2.3-cp39-abi3-manylinux2014.whl, but upload only the source distribution foo-1.2.3.tar.gz (let’s say, because they wheel build failed or because the CI machine crashed after the source dist upload). Now you make some code changes, rebuild foo 1.2.3 and try uploading foo-1.2.3.tar.gz and foo-1.2.3-cp39-abi3-manylinux2014.whl. foo-1.2.3.tar.gz already exists, so only foo-1.2.3-cp39-abi3-manylinux2014.whl gets uploaded. Due to the code changes, the published source distribution and wheel mismatch: A mac or windows user building from the source distribution will get different code than the linux user, breaking the contract (and core assumption in packaging tools) that all files in a release match.

One option is adding index url as mandatory parameter to skip-existing: By using the hashes in the index URL, we check before uploading whether a file with the same filename does not yet exist (new upload), a file with the same name and same content/hash exists (we can skip it, such as a 1.0 that is still present in your dist/ when you're now publishing 1.1) or a file with the same name but different content exists (we error). The disadvantage is that this is more complex both on the user side and the uv side than twine's --skip-existing.

konstin commented 2 weeks ago

e.g. if I have published version 1.0.0, now wants to build and publish 1.0.1 locally, it it will try to upload 1.0.0 as well. I have to manually remove the older versions to make it work.

We could take a similar approach as poetry publish here and require the project's pyproject.toml to be present (either in the current directory or as part of the workspace), then only upload the version defined there. This would change CI workflows in some cases because it requires a checkout step for the publish step (e.g. if you build multiple wheels for different platforms, then collect and upload in a separate job, that job needs a checkout too).

davidszotten commented 2 weeks ago

i agree that the current implementations are a bit of a hack (and from your detailed description, perhaps even dangerous). just a note that the proposal of checking version against pyproject.toml doesn't satisfy mine (or op's i think) use-case of using skip-existing as a proxy for "was the version just bumped". (i'm open to ideas of other ways of achieving this though)

TiansuYu commented 2 weeks ago

In my workflow, I would also need to generate a dev version on the fly and publish a random dev version on every merge to main. Please also keep this in mind, when you devise uv publish. (In the end, probably just edit version in pyproject.toml in CD, and not interfere the workflow @konstin proposed here.)

konstin commented 2 weeks ago

@TiansuYu Could you expand on your overall workflow, i.e. how does the wheel get its random dev tag and where/how does this wheel from main get used?

TiansuYu commented 2 weeks ago

I would somehow edit the version string e.g. 1.0.0 -> 1.0.0.dev0 (using a script or something, or if uv can offer something like uv version dev kinda like poetry version patch but append a random version string.) Then run uv build and uv publish. This is what I mean.

zanieb commented 2 weeks ago

I also use a "publish development release on main" pattern

https://github.com/zanieb/poetry-relax/blob/5706001e383a2f3327f4ab7c2ced2b57d39e287a/.github/workflows/build.yaml#L51-L60

https://github.com/zanieb/poetry-relax/blob/5706001e383a2f3327f4ab7c2ced2b57d39e287a/scripts/version#L21-L25

konstin commented 2 weeks ago

For skip existing we can use the following design:

The user can pass --keep-existing <index url>. For each file given to upload, uv checks whether the file is already on the index. If the filename does not exist, uv will upload the file. If the filename does exist and the file on the index is the exact same as the local file (hash match), we skip the upload as there's nothing to do. If the file exists and mismatches with different content, we error: There is an inconsistency, and uv does not upload in this case.

The upload itself may then error or succeed. If it errors, we check the index URL again: Maybe there was a parallel upload of the same file, and the other upload was quicker (avoid TOCTOU errors). We then apply the same logic: If the filename does exist and the file on the index is the same (hash match), it's ok, there was an upload race. If the file exists and mismatches, we error: There is an inconsistency, and uv does not upload in this case. I'm not sure yet if we should do the second check only if the error is one of the twine-recognized file-already-existed conditions; i tend to think we can just do the second check on any error.

This enables re-trying publish, but it still requires the author of a package to ensure that they only try to publish each version from a specific commit. We impose this requirement to avoid situations where different files in a version were built from different sources, which breaks a fundamental assumption in uv.