Closed legoktm closed 9 months ago
Sounds like a great plan! Having the ability to e.g. make deb
in each of the respective component repos will make it a lot easier to iterate.
. The main thing is that each git repo now mostly (minus some verification stuff and localwheels) contains everything needed for packaging
The reproducible-wheels logic would need to be copy/pasted around to maintain. Instead of committing to do that, we should consider whether we need the reproducible wheel logic going forward. We certainly want reproducible debian packages, but we could achieve that by using the wheels from PyPI, and pinning those hashes directly. Doing so would also ease our transition between stable distros, since we'd have access to a breadth of precompiled wheels for various Python versions.
We certainly want reproducible debian packages, but we could achieve that by using the wheels from PyPI, and pinning those hashes directly. Doing so would also ease our transition between stable distros, since we'd have access to a breadth of precompiled wheels for various Python versions.
That seems reasonable to me. The one thing I'm not sure about is whether the manylinux wheels on PyPI are statically linked to things like libssl, or whether it's dynamic linking (I'm not actually sure what our current localwheels are either tbh).
I've also been thinking about shipping the sdk as its own Debian package rather than as a wheel, see https://github.com/freedomofpress/securedrop-debian-packaging/issues/203#issuecomment-1109309502.
However in our case, there is no distinction between the "upstream" project and the packaging - they're done by the same people at the same time.
Good point. Thanks for the details around the different use cases. It makes a lot of sense to move all package-specific information inside each corresponding package repo, rather than inside the packager tool repo.
The reproducible-wheels logic would need to be copy/pasted around to maintain. Instead of committing to do that, we should consider whether we need the reproducible wheel logic going forward. We certainly want reproducible debian packages, but we could achieve that by using the wheels from PyPI, and pinning those hashes directly. Doing so would also ease our transition between stable distros, since we'd have access to a breadth of precompiled wheels for various Python versions.
I could be missing something from the original reproducible wheels debate, but as I recall, the strongest argument for keeping our own local wheels was that some of our dependencies on PyPI are not built reproducibly, therefore we cannot programmatically verify that the binary was built from the exact source tarball that we diff-reviewed. I also recall the list of non-reproducible wheels shrinking, so this might need to be re-reviewed. At this point, I'm not sure if it's worth the trouble of building and maintaining our own local wheels since I'm not sure it's an actual security risk where malicious binary wheels are released alongside source tarballs that do not contain the malicious code (are there actual known cases of this happening?). @kushaldas might remember something else I'm missing in the arguments for or against keeping local wheels, but as I recall, our goal is to eventually remove this logic once there's no longer a security risk (/cc security engineers @l3th3 and @lsd-cat).
relevant (though down) for those who may not have seen it before: https://github.com/redshiftzero/reproduciblewheels
https://github.com/freedomofpress/securedrop-debian-packaging/pull/315 updates the script to move us in this direction, there are a few more back-compat issues I just noticed that I'll fix on Monday.
I also posted PRs for updating securedrop-log and modernizing the packaging. I can pretty quickly do the rest, but if people are interested I think it would be a good pairing/knowledge-sharing opportunity.
However in our case, there is no distinction between the "upstream" project and the packaging - they're done by the same people at the same time.
One big reason to do this was to make sure that we can keep releasing the code as we wish and fix the packaging issues in a separate place. Even though we are the same group who writes the code and also do the packaging, we always have to make a new release of the source + everything when we make mistakes (or try to do new things) in the packaging part. I remember seeing this enough number of times that I suggested this flow.
At this point, I'm not sure if it's worth the trouble of building and maintaining our own local wheels since I'm not sure it's an actual security risk where malicious binary wheels are released alongside source tarballs that do not contain the malicious code (are there actual known cases of this happening?).
I will just say that there is a reason all big companies/projects maintain there own verified repository of dependencies and install only from that (totally skipping to download install anything from upstream pypi) due to the supply chain attacks. And most of them also do dependency code review internally just like we do.
Right now in the Python upstream we are working to bring in reproducibility as default for various packaging/release parts so that it becomes normal. But, then also making sure that we only use verified dependencies from our own repository is essential.
At this point, I'm not sure if it's worth the trouble of building and maintaining our own local wheels since I'm not sure it's an actual security risk where malicious binary wheels are released alongside source tarballs that do not contain the malicious code (are there actual known cases of this happening?).
This is something that to my knowledge we cannot entirely rule out and we should probably keep guarding against.
I don't recall the discussion on this in any great detail, but as @kushaldas pointed out there were two goals here: to separate packaging issues from code issues, and to protect from the (non-academic) risk of supply chain attacks against dependencies.
IMO the former made more sense during early development, but as we move to a more regular and frequent release cadence it does simplify things to move packaging logic for applications back into their respective repos.
But I'm :100: in agreement that supply chain attacks are still a real possibility, and we need to be doing a better job here - fixing up the server builds to use securedrop-debian-packaging has been on the TODO list for a lonnng time.
Like I said, I can't remember if it came up originally, but what would the downside be to hosting our own package repo? (Or possibly even just slimming down securedrop-debian-packaging such that it only contains localwheels
and pulling that in as a submodule during builds?)
However in our case, there is no distinction between the "upstream" project and the packaging - they're done by the same people at the same time.
One big reason to do this was to make sure that we can keep releasing the code as we wish and fix the packaging issues in a separate place. Even though we are the same group who writes the code and also do the packaging, we always have to make a new release of the source + everything when we make mistakes (or try to do new things) in the packaging part. I remember seeing this enough number of times that I suggested this flow.
I think that might've been true when the packaging was being set up, but today packaging tweaks that don't come along with other code changes are rare enough that I think we should optimize for the common workflow, at the cost of rare cases requiring potentially unnecessary source releases.
FTR I don't actually want to change anything w/r to localwheels in this ticket, the goal here was to make each repo's packaging self-contained in that repo itself, except for localwheels.
IMO the former made more sense during early development, but as we move to a more regular and frequent release cadence it does simplify things to move packaging logic for applications back into their respective repos.
When I started working on SecureDrop, one of my initial comment was that we are working on a distribution, which even includes our own kernel. It feels like the projects are stable and we all should move the packaging into their own repos etc, but I still feel this will become trouble again in future someday. Specially when there are only such a small number of folks working on the project. Keep the packaging on it's own repository really allows us to keep doing proper code releases and doing the packaging work as required (for example when we build for 2 different versions of base Ubuntu etc).
Like I said, I can't remember if it came up originally, but what would the downside be to hosting our own package repo? (Or possibly even just slimming down securedrop-debian-packaging such that it only contains localwheels and pulling that in as a submodule during builds?)
We initially had our own package repository hosted on s3, but doing that along with keeping all the tools with signature and verification did not help in maintaining that infrastructure. It was a pain to update properly. Having all of this maintained in this repository along with signed metadata (sha256sum of the sources/wheels) helps, and also reduces the requirements of fetching things over network. In the ideal build system, the build boxes should not be allowed to pull anything over network and everything should be already part of the repository for builds.
We were supposed to start working (after the Focal update) to make sure that the actual securedrop-app-code
should also uses the same structure and uses the reproducible dependencies etc.
FTR I don't actually want to change anything w/r to localwheels in this ticket, the goal here was to make each repo's packaging self-contained in that repo itself, except for localwheels.
I think mostly people are reacting to:
The reproducible-wheels logic would need to be copy/pasted around to maintain. Instead of committing to do that, we should consider whether we need the reproducible wheel logic going forward. We certainly want reproducible debian packages, but we could achieve that by using the wheels from PyPI, and pinning those hashes directly. Doing so would also ease our transition between stable distros, since we'd have access to a breadth of precompiled wheels for various Python versions.
If all of our dependencies are one day reproducible, then we could build each dependency once, verify the hash, and pin to that hash (continue to do diff-reviews of the source code- no one is suggesting to stop that practice). Until then, we can't safely use the wheels on PyPi that are not built reproducibly because of the security risk where malicious binary wheels are released alongside source tarballs that do not contain the malicious code.
Update: Even if all our dependencies are reproducible, there's an argument that we still would want copies of these wheels hosted locally so that we can do "networkless" builds (networkless except for pulling down from apt and git lfs localwheels).
Right now in the Python upstream we are working to bring in reproducibility as default for various packaging/release parts so that it becomes normal. But, then also making sure that we only use verified dependencies from our own repository is essential.
@kushaldas, thanks for chiming in! It sounds like your recommendation is to continue to not use PyPi wheels (only to download the source tarballs from there) even in the case where all the PyPi wheels could be verified, because they can be reproducibly built, and verified from our own build system? This must have to do with a different supply chain attack than what I described above, but I'm failing to come up with a concrete example (the action item here could be that I go off and do a bunch of research on my own and update our docs, but if a maintainer is already able to explain this scenario I think it would make sense to add details in our threat model around why we must use our own wheels rather than verified wheels on PyPi).
At this point, I feel like @legoktm is rightfully trying to drive the discussion away from this and towards just moving the debian/ directories into their respective project code repositories. So, I'll plan to follow up on any more discussion outside of this issue on the supply chain mitigations topic.
When I started working on SecureDrop, one of my initial comment was that we are working on a distribution, which even includes our own kernel. It feels like the projects are stable and we all should move the packaging into their own repos etc, but I still feel this will become trouble again in future someday. Specially when there are only such a small number of folks working on the project.
Taking a step back, my underlying goal is to get us to use a more standard Debian packaging workflow (git-buildpackage), which will allow us to embrace and utilize more standard tooling. My analysis was that using native packages (where source + packaging releases are the same) was the optimal solution, but it's not the only way to get there. If we we want to keep source releases and packaging releases separate, the standard workflow is to have the packaging exist in a separate debian
branch.
But I do think that using a native package is the way to go here, making it easy/convenient for all the normal release/packaging changes we have to make, improving the reliability of packaging CI since it's not hugely split across two repos. Using a separate branch and keeping source and packaging repositories separate helps a bit but still keeps a gap between the two that I'd like to cut down I am curious what trouble you're worried about in the future, I'm not super familiar with all the problems that have happened in the past.
Keep the packaging on it's own repository really allows us to keep doing proper code releases and doing the packaging work as required (for example when we build for 2 different versions of base Ubuntu etc).
A bit off-topic, but in my ideal world we will be able to re-use the exact same packaging across multiple Debian/Ubuntu versions (and just have the build script auto-add a new changelog entry). For some of the more complicated packages like kernel stuff, we'll need separate branches for each version, which I've half-demoed at https://github.com/freedomofpress/securedrop-grsec (the branch is called ubuntu/focal
, in the future we could have a ubuntu/jammy
or debian/bullseye
, etc.)
Re-upping this for discussion, as in recent experience I've found that:
IMO keeping the build scripts and wheel logic in a single repo makes sense, but application-specific config should live in the application repo.
repo consolidation has invalidated this - there is now a single debian/ dir in securedrop-client that builds all .debs for SDW.
Progress
Rationale
For the main Python projects (securedrop-client, securedrop-log, etc.), we are currently maintaining the debian/ directory separate from the main project. This is similar to the workflow used when you have a distinct upstream that releases software, and someone in Debian who then packages it and distributes it.
However in our case, there is no distinction between the "upstream" project and the packaging - they're done by the same people at the same time. We make our releases in the format of Debian packages - if there's a code change a new package will be built and if there's a packaging change, a new release will be issued! In Debian terminology this is called a "native" package.
This leads to extra work split across multiple repos, e.g. to release 0.7.0 of securedrop-client, it required two PRs, one to bump the version (https://github.com/freedomofpress/securedrop-client/pull/1483) and then another in this repo to bump the version in d/changelog (https://github.com/freedomofpress/securedrop-debian-packaging/commit/5c71d68f59bfe7b09f212bbbe3bdca12c45264f9). Or consider if you wanted to add a new file to be installed, you'd have to make your PR and then do another PR in this repo to add it to
d/<package>.install
. And we also don't test new changes against the packaging until the nightly runs, which is subpar.My proposal is that we just move the debian/ directory for these projects into the corresponding Git repos (WIP example: https://github.com/freedomofpress/securedrop-log/commits/debian-dir). For now, this repo will still be the main entrypoint for building packages. The immediate benefit will be that making releases and packaging changes doesn't require two coordinated PRs. In the build logic, we can skip everything about building a sdist tarball and then repacking it, we'll just use the git repo tree as our build directory (see my WIP: https://github.com/freedomofpress/securedrop-debian-packaging/commit/1f9c4c2942a020cd43a2531c184f7f1a35edc5ed).
This change also unlocks more possible future steps. The main thing is that each git repo now mostly (minus some verification stuff and localwheels) contains everything needed for packaging, which would allow us to use git-buildpackage workflows. We could also run CI for building a package on every commit/PR, and rather than doing nightlies (or we keep nightlies too? idk), just build new packages after every commit pushed to main.