daniperez / vale-python-package

Install πŸ“ Vale (grammar & style check tool ) in 🐍 Python environments. Package available at πŸ“¦ https://pypi.org/project/vale/
MIT License
8 stars 3 forks source link

[CD advice] Publishing Python packages #30

Open webknjaz opened 9 months ago

webknjaz commented 9 months ago

Hey, I noticed that python-publish.yml uses a number of outdated practices that is best to fix:

1) A long-living API token is used β€” get rid of it and replace with secretless publishing; my PyPUG guide is already updated to guide you through this: https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/ 2) Building in the publishing job is a security concern as it enlarges the attack surface; the same guide shows how to fix this with reduced privileges; this is especially important with secretless publishing 3) Using --sdist --wheel together causes pypa/build make dists separately, from the Git checkout; the best practice to smoke-test what pip would actually do is to drop them; this will make it build sdist from Git, but wheel from that sdist

daniperez commented 9 months ago

Hi @webknjaz ! Thanks for the advise. It's very important to secure the supply chain. Just allow me some time to implement those points :innocent:

daniperez commented 9 months ago

Done! Pardon the lack of professionalism, the changes can be found in the following commits: https://github.com/daniperez/vale-python-package/compare/0bdb82f...main

@webknjaz Let me know if you had any feedback. If it's not any close to be good, I can create a PR and move the discussion there.

webknjaz commented 8 months ago

Hey.. I didn't mean to make any implications about your professionalism or criticize you in any way. Don't be hard on yourself! This was a mostly drive-by comment β€” I tend to post these when I see something that I'm expert in, to help others improve. Your initial implementation wasn't inherently insecure β€” the API secrets are still considered secure (and that's what people not using GitHub will end up being forced to use for uploads in 2024, since 2FA is going to be enforced for this action). Secretless publishing is just a bit more convenient (and uses short-lived tokens under the hood) β€” you're using my pypi-publish action, where I was helping to implement support for trusted publishing very early, before this feature was even available to the wider public. So it's something I feel like I can educate people about, sometimes. I only found your repository, looking into starting to use Vale β€” it was suggested that PyPUG (https://github.com/pypa/packaging.python.org) integrates it at some point so I got curious ;)

That said, there are a few things that you can improve: 1) You have a GitHub Environment called production β€” in the repo settings, I recommend enabling required reviews (also, I usually standardize on using pypi as the environment name, because words like release / prod aren't really accurate in this context) 2) You have a step running python3 -m build --sdist --wheel, it should be replaced with python3 -m build β€” it'll still make both wheels, but will introduce a free smoke test 3) Now that you dropped the skip-existing hack β€” previously, one of your commits would cause a successful upload and it'd be hard to track down which commit specifically. You can integrate something like setuptools-scm to derive versions from tags β€” this way, you'd have tags on GitHub corresponding to the releases on PyPI. 4) Additionally, triggering the release can be shifted to tag pushes and not pushing on arbitrary commits. I personally like to use the workflow_dispatch: trigger, instead of push: β€” this allows running the workflow on the button click in the GH UI (and such a workflow typically push a new tag to GitHub after a successful release to PyPI). 5) Distributing a script that downloads things from the internet in runtime, as opposed to the install-time can be seen as dangerous. I recommend updating packaging to bundle all the binary artifacts in sdist, and making a bunch of platform-specific wheels instead. It's going to be tricky, but not very complicated β€” you'd need to loop over the wheel creation, leaving just one artifact per platform inside the wheel. And there's a wheel tag command now, that allows you changing tags of an existing wheel β€” you'd need to mark them with manylinux/macos/win64/aarch etc. correspondingly. Then, the console script would just os.execv() that single artifact that's installed and there would be no need to access the internet from within the end-users' machines.

So that's my general feedback, I think... Oh, and you could implement the sigstore signing from the guide β€” together with bundling the upstream binaries into Python distributions, this should address the supply chain-level security concern.

daniperez commented 8 months ago

Hey.. I didn't mean to make any implications about your professionalism or criticize you in any way. Don't be hard on yourself! This was a mostly drive-by comment β€” I tend to post these when I see something that I'm expert in, to help others improve. Your initial implementation wasn't inherently insecure β€”

I was apologizing for the lack of professionalism of the fix itself ☺️ I'm publishing to PyPI on "merge to main" only, I did all the changes directly to main, without a PR and without allowing yourself to properly review the code (and hence the "lack of professionalism").

But I got your comment exactly as you intended: someone who knows better than me spent some time to improve something in this petty project (and it's so easy to implement that it would be a shame if I didn't!).

I only found your repository, looking into starting to use Vale β€” it was suggested that PyPUG (https://github.com/pypa/packaging.python.org) integrates it at some point so I got curious ;)

Thanks also for the background. That adds some perspective to the initial piece of advise :+1: appreciated!

That said, there are a few things that you can improve:

1. You have a GitHub Environment called `production` β€” in the repo settings, I recommend enabling required reviews (also, I usually standardize on using `pypi` as the environment name, because words like `release` / `prod` aren't really accurate in this context)

you are right! :+1:

2. You have a step running `python3 -m build --sdist --wheel`, it should be replaced with `python3 -m build` β€” it'll still make both wheels, but will introduce a free smoke test

My bad, I thought I pushed that change but it was sitting in my computer uncommitted :facepalm:

3. Now that you dropped the `skip-existing` hack β€” previously, one of your commits would cause a successful upload and it'd be hard to track down which commit specifically. You can integrate something like `setuptools-scm` to derive versions from tags β€” this way, you'd have tags on GitHub corresponding to the releases on PyPI.

:+1:

4. Additionally, triggering the release can be shifted to tag pushes and not pushing on arbitrary commits. I personally like to use the `workflow_dispatch:` trigger, instead of `push:` β€” this allows running the workflow on the button click in the GH UI (and such a workflow typically push a new tag to GitHub after a successful release to PyPI).

:+1: makes sense! I didn't had much time to spend on this when I created the project, so there are corners cut everywhere.

5. Distributing a script that downloads things from the internet in runtime, as opposed to the install-time can be seen as dangerous. 

That one is an obvious one :+1: When I created the project I was thinking how wasteful (in terms of storage, for example) it would be if this little package of mine had to "wrap" every vale release. It also spared me having to handle properly multiplatform publication (I'm only publishing one wheel). Anyhow, that one is probably an important one that I'll do rather sooner than later, perhaps when I figure out how to handle the publication for several platforms.

It's going to be tricky, but not very complicated β€” you'd need to loop over the wheel creation, leaving just one artifact per platform inside the wheel. And there's a wheel tag command now, that allows you changing tags of an existing wheel β€” you'd need to mark them with manylinux/macos/win64/aarch etc. correspondingly. Then, the console script would just os.execv() that single artifact that's installed and there would be no need to access the internet from within the end-users' machines.

Thanks for the comprehensive info and for the sketch of the solution! How to deliver properly for several platforms wasn't totally clear for me, but what you suggest is simple.

So that's my general feedback, I think... Oh, and you could implement the sigstore signing from the guide β€” together with bundling the upstream binaries into Python distributions, this should address the supply chain-level security concern.

:+1: yeap, that goes definitely to the TODO list!

webknjaz commented 8 months ago

How to deliver properly for several platforms wasn't totally clear for me, but what you suggest is simple.

The publishing is simple β€” just have all the wheels and sdist in the same directory and my action will call twine which uploads everything.

The hard part is producing packages β€” it's non-trivial since I had in mind tricks with wrapping the setuptools' PEP 517 build backend, similar to what's outlined @ https://setuptools.pypa.io/en/latest/build_meta.html#dynamic-build-dependencies-and-other-build-meta-tweaks. The problem is that the version in the metadata isn't exposed to the hooks because that's their output, not input β€” and setuptools-scm would do that within the setuptools land. So the wrapper would probably need to call setuptools-scm or dunamai to get the version in build_sdist, and using that version, download the binaries into the tree, following by invoking the original hook from setuptools. The version could also be supplied via a PEP 517 config setting, on command-line. As for the wheel creation, something similar is needed β€” only keeping the matching binary, calling setuptools' build_wheel and post-processing it with wheel tag.

So this is all doable on the PEP 517 interface level, but needs a lot of experimentation. And I'm sure, there will be corner cases along the way.