Open david-a-wheeler opened 2 years ago
I think it would be a bad idea to water down the reproducibility criterion by permitting certain classes of differences. To put it glibly: either a checksum matches or it doesn't.[^1]
Furthermore, eliminating timestamp pollution from the artifacts of a complex build process is often the absolute lowest-hanging fruit towards getting a stable reproducible environment running. There are many more tedious aspects to getting all the bits lined up just right in a way that can be replicated perfectly by others. So calling out timestamps as being explicitly excluded from matching criteria wouldn't much help anyone towards being able to claim build reproducibility for their project.
I do sympathize with projects who don't, themselves, publish binary artifacts for various reasons[^2]. However, I think this could be addressed in other ways, such as by allowing community-driven reproducibility projects to confer gold status through some sort of trusted consensus mechanism. Free GH actions minutes for OSS projects would go a long way towards providing ready-made infrastructure for collaborative "build verification" services for various platforms.
Build reproducibility is becoming a cornerstone of security (see the recent USDOD Securing the Software Supply Chain: Recommended Practices for Developers). I think it should remain as part of the 🥇 gold standard of this project, or else be bumped up to some new 💎 "diamond standard".
Also, I do think that a watered-down goal of "an attempt at reproducibility" with some exceptions might make a good addition to the 🥈 silver criteria standard.
[^1]: RE: “bit-for-bit results except for timestamps”: a seemingly-random array of timestamp values in an executable binary could also be crafted as a series of operations that trigger a buffer overflow. [^2]: Although Apache generally does, at least for all the Java projects I've contributed to there over the years, such as these downloads that include binary artifacts and published checksums.
Many projects don't release built software at all (Linux kernel, Apache Software Foundation). In those cases this can be marked as N/A. The project may still explain how to do verified reproducible builds in those cases (and get credit for it), but they should be allowed to say N/A in those cases.
I think we should differentiate between builds reproducible
for software that only distributes source code and can be reproduced
for software that also distributes binaries. If the project distributes binaries (for example attached to their github release, but also docker images), they should also publish documentation on how to build this exact binary, bit-for-bit identical, from source. Projects like i-probably-didnt-backdoor-this, Tails and bitcoin-core are doing it successfully, but there are very few standard tools available for this (lots of talk around SBOMs but very few tools to setup a build environment based on an SBOM for example).
For projects that only distribute source code, there is no binary that can be reproduced
but the build should still need to provide a stable, deterministic output to be considered builds reproducible
. A downstream Linux Distribution can then take care that the binary they build and ship can be reproduced
. The Linux kernel specifically is currently very non-trivial to build reproducible, Arch Linux and Debian are both struggling with it, so I don't think it should be considered build_reproducible
or N/A
for example.
I don't think upstream projects that are known to produce binaries/artifacts that are difficult to secure further down the supply-chain should be allowed in Gold Tier.
Many projects struggle only because of timestamps being different when there's a rebuild. Those produce differences in bit-for-bit comparisons, but I don't see how such date/timestamp differences by themselves lead to subverted software (there would have to be something else to act as a trigger). Timestamp differences are one of the most common causes for non-reproducibility, and since those differences by themselves aren't security-relevant, it seems overly harsh to demand them. I think it's good to do because it makes later comparison easier, but as I said, it's probably overly harsh.
Normalizing timestamps in a build is a fairly trivial issue, it's much easier to just fix the differences there instead of having to write programs that try to tell benign differences and underhanded backdoors apart with 100%-reliability. Needing manual intervention to inspect diffs should be the exception, not the norm for software with the build_reproducible
gold criterion.
@marcprux hit a lot of important points, thanks!
The problem of doing something "bit-for-bit identical except for ..." presents pragmatic challenges.
In short, it is trivial to compare two artifacts, it presents a whole world of difficulties to compare only parts of two artifacts.
I would strongly caution against using "reproducible builds" in any way other than https://reproducible-builds.org/docs/definition/ which really comes down to bit-for-bit reproducible without exception.
The projects that I use from the Apache Software Foundation, such as Apache Maven and Apache NetBeans, publish a "convenience binary" along with the source release, all under the Apache name. Apache NetBeans even publishes a Snap package binary. The Maven build is reproducible, but the NetBeans build has quite some way to go before being reproducible.
I have found it surprisingly, and frustratingly, difficult to get changes related to reproducible builds accepted by upstream projects. Holding the Apache Software Foundation to a different standard than other open-source projects just makes that even more difficult. It would remove my incentive to make the changes and one of their incentives to accept them.
I would prefer the meaning of reproducible builds to remain bit-for-bit identical, including timestamps. Even for organizations that truly publish only a source release, one could argue that they should have the gold badge only if that source can be built in a reproducible manner.
- more complicated verification process becomes a software development project unto itself
I will second this, @vagrantc. My own App Fair process creates verifiably reproducible iOS apps, but it is a constant struggle against ever-changing versions of the tools generating indeterminate output in insidious new ways (e.g., due to changes in Xcode's compiler parallelization). My main motivation for spending all these hours on tedious devops debugging is the stretch goal that these apps eventually achieve gold certification, and thereby serve as paragons of trust to the mobile community.
As a strong supporter of bit-for-bit integrity without compromises: when developers report challenges producing software to adhere to some evaluation criteria (be that testability, security, performance or other), often it helps to introduce tooling improvements so that flaws (and opportunities) can be detected earlier in the assembly process (for example: the "shift security left" mantra, and similarly with continuous integration in general).
I have a sense that the challenges experienced by many developers are a result of the fact that we generally have to inspect the output of builds (diffs - sometimes binary) to identify where non-reproducible elements have appeared, and then perform sometimes-mentally-challenging detective work to theorize and evaluate what could have caused those artifacts to appear.
Hermetic build environments that can detect changes as soon as they're introduced during assembly could - I think - be an area where improved tooling might help to stem the introduction of non-reproducible elements early during development, in a way that could be largely-ecosystem-agnostic, and help to win developer mindshare.
There could be practical challenges implementing fail-fast hermetic builds (are filesystem reads/writes the unit of integrity? is language-level and/or IDE-level support required? how would ephemeral and tempfiles be handled?) but I think they're manageable. And similar to test-driven-development: not everyone will want to adopt early-detection since it would add development friction - but for those who understand the value of it as an investment, the benefits should be clear.
(sorry for sidetracking a bit: again I would reiterate that there shouldn't be exceptions for timestamps, because it's not clear what even is simply a timestamp at a binary level, and it would open the door to the very risks that reproducible builds are intended to solve, and there are unanswered questions about how integrity verification could be performed on content that fundamentally differs -- all points that others have alluded to. however I want to state both my support and suggestion that there may be solutions to address concerns)
Many projects struggle only because of timestamps being different when there's a rebuild. Those produce differences in bit-for-bit comparisons, but I don't see how such date/timestamp differences by themselves lead to subverted software (there would have to be something else to act as a trigger). Timestamp differences are one of the most common causes for non-reproducibility, and since those differences by themselves aren't security-relevant, it seems overly harsh to demand them.
I think this isn't a good idea because it can lead to the dangerous and false assumption that the rebuild with only a different embedded timestamp can be considered identical in behaviour. But any binary could change its behaviour when there's some specific embedded timestamp. Yes, that would be visible in the source, but might be intentionally hidden as well.
So while, when we get to the "only timestamp is different" level of almost-reproducibility, it's easy to go the last step (easiest: replace the timestamp in the binary you just built with the other one), this step is just as crucially important as all the other ones, so it can't get any special treatment.
(The Android App world has a related problem with it's embedded signatures, which you can never reproduce except by copying the signature from the original binary into your rebuild as a last step. But without this step and a different signature the app is expected to behave differently in many scenarios.)
Hermetic build environments that can detect changes as soon as they're introduced during assembly
An option that should work until such environments are available would be to compare the two build trees after a build. Those will have files that aren't copied into the build artefacts, but the comparison could be restricted to files that typically end up in the build artefacts. That should help narrow down the source of non-determinism.
practical challenges implementing fail-fast hermetic builds
Essentially this would mean instrumenting all the tools used during a build to record data and related metadata and record the chain of transformation connecting the source files to the build artefacts.
The program receiving the instrumentation would then terminate the build when it receives data different to a previous build.
That would have a lot of false positives though, since input that is non-deterministic might not get represented in the build artefacts.
The build tracing feature would be useful for other situations too.
I think that I have seen a paper about a build tracing tool, but that used ptrace rather than build instrumentation within tools and I haven't been able to track down the paper.
-- bye, pabs
While I understand the desire for reproducible builds, in case such as Java where timestamps are introduced in to the zip-file archives (.jar, .war, and .ear files for those not familiar with Java), blasting the timestamps so they are all set deterministically set actually can cause useful information to be lost.
Case in point: Soon after Oracle acquired Sun Microsystems, pretty much every one of Oracle's patch release notes for Java (including some of their corresponding CVE descriptions) were of the intentionally vague form of "multiple unspecified vulnerabilities were patched" or some such BS). My management would ask me, "Would you please analyze the patches and tell us if there's anything we urgently need to patch?" (This was way before SCA tools BTW.) So I would extract all the .class files from (typically) the rt.jar and look at the modification timestamps to see which had been updated since the last patch release we were using. Then I'd de-compile those .class files and do the same for the .class files from corresponding jar from the previous patch release and then finally diff the two versions to see what Oracle had actually fixed. (Don't miss that work at all!) However, had those timestamps all been identical because of deterministic reproducible builds, it would have made that task take a hundred fold times or so longer.
So while there are times when deterministic, reproducible builds might be useful (they never will be unless people decide to verify all of that in their CI/CD pipelines, which I think most companies will be reluctant to do because of the build time resource commitment involved), IMO, for most cases, it brings very little added value.
Just my $.02.
... blasting the timestamps so they are all set deterministically set actually can cause useful information to be lost.
Having reproducible builds does not preclude incremental updates to Java archives. It's just that the dates of the old and new class files would be meaningful, such as their separate release dates. OpenJDK builds don't use such incremental updates anymore, but they could, and they could do so in a reproducible manner, allowing your detective work to go on as before.
Reproducible builds is about blasting away all the useless, meaningless differences: the timestamps of files created during the build, the unsorted order of files in their directories, or the random build paths used in a transient container. When the useless differences are removed, the meaningful differences can be found.
.. IMO, for most cases, it brings very little added value.
Oh, but its value to OpenJDK is already apparent, even though its build has been reproducible only since May. For just one example, this old Javadoc bug, only tangentially related to reproducible builds, would have been impossible to find, and its fix impossible to verify, without the easy ability to create bit-for-bit identical builds.
While I understand the desire for reproducible builds, in case such as Java where timestamps are introduced in to the zip-file archives (.jar, .war, and .ear files for those not familiar with Java), blasting the timestamps so they are all set deterministically set actually can cause useful information to be lost.
Case in point: Soon after Oracle acquired Sun Microsystems, pretty much every one of Oracle's patch release notes for Java (including some of their corresponding CVE descriptions) were of the intentionally vague form of "multiple unspecified vulnerabilities were patched" or some such BS). My management would ask me, "Would you please analyze the patches and tell us if there's anything we urgently need to patch?" (This was way before SCA tools BTW.) So I would extract all the .class files from (typically) the rt.jar and look at the modification timestamps to see which had been updated since the last patch release we were using. Then I'd de-compile those .class files and do the same for the .class files from corresponding jar from the previous patch release and then finally diff the two versions to see what Oracle had actually fixed. (Don't miss that work at all!) However, had those timestamps all been identical because of deterministic reproducible builds, it would have made that task take a hundred fold times or so longer.
If the timestamps are not deterministic they could very well be entirely arbitrary; you might end up with timestamps of whatever checkout the build of those class files happened to be performed on, whatever timestamp the developer happened to use at the time, whatever wonky clock was used, which would actually prevent you from being able to compare the timestamps in the way you actually described...
Clamping the timestamps to the last source change or some other meaningful timestamp will more reliably get you the feature you described, presuming the other files actually retain meaningful timestamps (last modification in VCS, for example, rather than whatever happened to be the on-disk time), and prevents files generated during the build from needlessly differing. And if they don't preserve meaningful timestamps, then you're no worse off that you were.
No need to blindly reset them if the process otherwise maintains meaningful timestamps; embedding the current clock time will nearly always require a maximally detailed process of comparison.
So while there are times when deterministic, reproducible builds might be useful (they never will be unless people decide to verify all of that in their CI/CD pipelines, which I think most companies will be reluctant to do because of the build time resource commitment involved), IMO, for most cases, it brings very little added value.
Nothing is useful unless people actually try to do it, true.
If we are talking about a best practice gold standard, well, let us not set the sights too low either. Recognizing that some things are harder and take more effort, and demonstrating that "this project follows all known best practices" vs. "this project follows many best practices" vs. "this project follows some best practices" should be reflected in the levels.
Just chiming in here to discourage any relaxation of the gold standard. the gold standard should be clear: bit-for-bit identical reproducibility. Please do not carve out subtle exceptions for variable timestamps.
For a project that distributes only source code artifacts, i still think it's worth asking during the review whether generated artifact used by the end user can be built reproducibly. Obviously, we don't want to require source-only software projects to distribute binaries, but presumably the developers do actually have some practice in building some user-facing artifacts. Such a project should be able to concisely describe a particular toolchain and set of compilation/configuration options and dependencies that are known to provide a reproducible build that covers a substantial portion of the codebase.
Some projects have raised concerns about challenges meeting the
build_reproducible
gold criterion. The purpose of this criterion is to counter malicious builds, as happened in SolarWinds' Orion, by enabling verifiable reproducible builds. We still want to counter the attack, but we may be able to relax the requirement slightly while still countering the attack:So under the `build_reproducible' gold criterion, modify:
Change the second sentence to read:
Change "result" to "built result", and replace the final period with: