My idea for this project is to solve the current infrastructure issue of binary artifacts. There are a couple main points that I'd like to fix:
Building binary artifacts for architectures the maintainers don't have access to
Building binary artifacts of old versions of projects for new versions of Python when they are released (eg. rebuild numpy 1.21 for Python 3.10 after 3.10 is released)
Improve build security and mitigate supply chain attacks
Currently, a lot of projects are building the distribution artifacts in remote machines, which could be compromised
Plan
To solve the described issues, I want to build a service that provides automated building of binary distributions.
The service would be triggered at new PyPI releases and would build binary artifacts for the set of supported architectures and ABI versions (manylinux, musllinux, etc).
The service would be triggered on new Python releases and would build binary artifacts for older versions of projects
The service would optionally upload the built artifacts to PyPI
Security Considerations
Having this building happen in remote servers (the ones provided by our service) introduces concerns about supply chain attacks. Although securing the service would be a high-importance/critical task, the model should not be tied to that point of failure. The infrastructure must be designed to be secure even if this weak point is breached, and for our considerations, we must assume that it inevitably will.
So, the service should be designed around the following key factor: build reproduceability.
Projects should be expected to have reproducible artifacts, this means that for each platform and ABI, the resulting artifacts will all be identical given the same source. We could have an escape hatch for this, but it should be penalized (eg. if we, or someone else, offer this service to the community for free, this escape hatch should be a paid feature, increasing the friction for projects to automate their builds this way and motivating them to fix the reproducibility of their builds.).
Building on top of reproducibility, we could have independently managed build nodes on our network to verify that builds have not been tampered with. We could also provide those nodes with distribution revoking tokens, which they would use if a distribution was uploaded to PyPI and it could not be replicated. This would mitigate attacks such as an evil party gaining control over the central service and abusing the upload tokens to upload malicious artifacts.
Motivation
My idea for this project is to solve the current infrastructure issue of binary artifacts. There are a couple main points that I'd like to fix:
Plan
To solve the described issues, I want to build a service that provides automated building of binary distributions.
manylinux
,musllinux
, etc).Security Considerations
Having this building happen in remote servers (the ones provided by our service) introduces concerns about supply chain attacks. Although securing the service would be a high-importance/critical task, the model should not be tied to that point of failure. The infrastructure must be designed to be secure even if this weak point is breached, and for our considerations, we must assume that it inevitably will.
So, the service should be designed around the following key factor: build reproduceability. Projects should be expected to have reproducible artifacts, this means that for each platform and ABI, the resulting artifacts will all be identical given the same source. We could have an escape hatch for this, but it should be penalized (eg. if we, or someone else, offer this service to the community for free, this escape hatch should be a paid feature, increasing the friction for projects to automate their builds this way and motivating them to fix the reproducibility of their builds.).
Building on top of reproducibility, we could have independently managed build nodes on our network to verify that builds have not been tampered with. We could also provide those nodes with distribution revoking tokens, which they would use if a distribution was uploaded to PyPI and it could not be replicated. This would mitigate attacks such as an evil party gaining control over the central service and abusing the upload tokens to upload malicious artifacts.