Open tildelowengrimm opened 10 years ago
Note that SecureDrop is not compiled, it is a Python web application (at the moment, this may change somewhat for 1.0, but it probably still won't be natively compiled). What constitutes a test of determinism in that case? It seems like we would need to compare directory layouts, possibly of the whole system, and compare file hashes (similar to file integrity checkers like Tripwire) to have a solid guarantee that the no component of the system has been compromised.
you could take a hash of the output of find . -type f -print0 | LC_ALL=C sort -z | xargs -0 cat
. The output should be byte-for-byte identical if the directory structure and contents of files are identical.
Note that if you have non-ASCII filenames, this will fail when you do the comparison across filesystems that have different Unicode normalization forms (http://en.wikipedia.org/wiki/Unicode_equivalence#Normalization). I ran into this when trying to make reproducible builds of HTTPS Everywhere between OSX and GNU/Linux systems; solved it by converting to NFD before sorting files.
Well there's the .pyc byte code. I guess the hash thing could be akin to telling Rootkit Hunter to check the SecureDrop path for modified files.
We don't ship any .pyc byte code. If people want to check the integrity of their securedrop install at runtime, it'd be slightly tricky because we would have to ignore the randomly generated keys anyway. But I think Tom's issue here is about defending against malicious tampering with the SecureDrop code before it reaches the end user, not about defending the end user against compromise of their machines.
If literally everything is interpreted rather than compiled, could this proposal be completely implemented through the use of gpg-signed git tags for releases?
I think this is useful in addition to checking signed git tags.
Could this proposal be completely implemented through the use of gpg-signed git tags for releases?
Possibly. We already do this (last release's signed tag). There are some compiled components of SecureDrop - for example, some of the Python dependencies compiled their own shared libraries (I know scrypt does this). So we might need to take a mixed approach.
It also seems like a large part of this is dependent on the rest of the system being verifiable. What does it matter if the web application is good, if the web server binary is compromised?
Say that in the future, we want people to be able to use SecureDrop by downloading a release as a .zip file instead of having to use git clone.
We already do this. There's a signed tar.gz on the Freedom of Press's SecureDrop site.
It seems like the goal is to be able to compare the software that goes into production with an publicly auditable copy (like this Github repo). So maybe what we need to do is develop a SecureDrop .deb package, and have an automated process for comparing downloads of that package to what can be built from the latest release on Github.
Beyond that, we'll have to rely on other projects (Debian, Tor, etc.) to get the rest of our dependencies in a verifiable state. We already host some binaries ourselves (e.g. OSSEC) to reduce the risk of tampering by third parties. I'm not sure if it would be better or worse to host copies of all of our dependencies.
I think it's definitely a bad idea to hose all the dependencies. It's completely reasonable to rely on the integrity of dependencies (and to choose dependencies based on their perceived integrity). SecureDrop should focus on making SecureDrop verifiable. Otherwise, where does it end. Should Securedrop be in the business of verifying the microcode updates on my BIOS and CPU?
That said, if SecureDrop has a particular target platform, it might be reasonable for the installation procedure first to check the integrity of the platform against various known sorts of problems. Obviously, if the platform is well-and-truly pre-owned that won't work, but it might protect against misconfiguration, and improve SecureDrop's ability to detect future attacks or compromise.
SecureDrop targets 64-bit Ubuntu 12.04 so the same packages should always work. I think we should trust the integrity of official repositories. I wonder if it would be possible to develop a .deb that accomplishes everything that production_installation.sh does... sounds like a fun task.
@garrettr wrote:
We already do this. There's a signed tar.gz on the Freedom of Press's SecureDrop site.
Let's make that tar.gz file deterministic as well as signed (in case the machine that is doing the packaging is compromised with malware that modifies local files before signing).
I played around with timestamp modification for a while and couldn't get deterministic archives with GNU tar; can we just use zip instead? https://github.com/devrandom/gitian-builder/blob/master/bin/canon-zip
(The size difference between tar czf
and zip -9
is not too bad; 23.4M vs 23.5M)
can we just use zip instead?
Sure!
I had no idea that Tar wasn't deterministic!
gzip's default behavior is to include the timestamp of the uncompressed file, which for tarballs is the timestamp of the intermediate tar archive. That means that your tarball is including the timestamp of the build run, which should explain why your bytes are different on each build even if the source is unchanged. Try this instead if you still want to use tar+gz:
tar cf foobar.tar foobar/
gzip -n foobar.tar
also see http://superuser.com/questions/705877/compressing-compressed-tar-gz-files-deterministically
So this post was from a while ago and obviously things have changed since this was last being discussed. Whatever was the past, it makes sense now to work on making our .deb
packages reproducible. Everything else is already deterministic in the sense that our repository contains no executable binaries. Here is Debian's guide to doing so: https://wiki.debian.org/ReproducibleBuilds/Howto. Since we have automated the process of building the packages we ship by using a 'build' VM provisioned by Ansible, it should be easy to standardize package versions for our toolchain and make the other changes necessary so that anyone can build SecureDrop packages reproducible from the comfort of their beds.
Also, note at some point down the road we would like to migrate to Debian. This may not be for years due to the massive migration and support costs, but perhaps sometime before April 2019, when the current Ubuntu LTS hits EOL. The amd64 testing repository of Debian (i.e., Debian 9/Stretch) is already at 89.5% reproducibility (https://tests.reproducible-builds.org/reproducible.html). Thus, a down-to-the-kernel reproducible SD app and mon server is possible in the coming years. I could also say we should go further and recommend the ASUS KGPE-D16 for which we could reproducibly build a binary blobless (i.e., no Intel ME, FSP, VBIOS, or CPU microcode updates) version of coreboot called libreboot for a down-to-the-boot-firmware reproducible system (but I won't go there, I'll just link it for fun and interest https://libreboot.org/docs/hcl/kgpe-d16.html).
Okay, so I really want to figure out how to do this and am going to make it happen by the 0.4 milestone. I've been making some small steps towards getting this working, and wheel has recently been patched to support reproducible .whl
builds https://bitbucket.org/pypa/wheel/pull-requests/52/apply-the-debian-patch-for-reproducible/diff. From some tests and analysis with diffoscope
(which I don't really know what I'm doing with yet), there still seem to be a number of things we'll need to change to get reproducible builds working.
Re-evaluated whether this is a good use of time to work on in the coming months and decided it is not. SD might never reach a 0.4 release. We are in the early planning phase of SD 1.0, which will be a huge rearchitecture of the system, and even the application code itself may be rewritten in another language.
I still think that reproducibility is important for both stability and security, so will not close this issue or un-assign myself. Rather, I think this issue should just be left open and re-addressed some many months from now when 1.0 is out.
Looping back on this, as I just finished developing a deterministic build environment for another project (take a look if you're curious) and have become familiar with Gitian. I think the same tools can be adapted for SecureDrop. For those that aren't familiar with how it works, you have a list of inputs (dependencies, OS packages, SecureDrop source code) which are all hashed. You then do a build and you get some outputs, e.g. the SecureDrop .deb packages, which are also hashed. You sign the resulting "manifest" and push it to a public GitHub repository which contains all of the builder's signatures for each release. The builders compare their manifests, and if there is a difference, you know there is some indeterminism or a modification somewhere.
What you need:
faketime
. Example: Bitcoin, Tor BrowserThe question is, what are your outputs that you want to build deterministically? I assume it is the SecureDrop application code .debs. The best way to find out if there are any determinism issues in your package or the Python dependencies is to just dive in. This is not that hard, what you will require is a new Vagrant VM for Gitian building, plus an Ansible role for provisioning it. We could do this in a new branch here, or in a separate source code repository.
@conorsch @garrettr maybe a regression on this topic is affecting the latest release?
i see that current release 0.3.10 is including pyc files.
i was just fixing this on on GlobaLeaks in relation to debian guidelines to perform compileall in the postinst script and identified this change between 0.3.9 and 0.3.10.
@evilaliv3 The .pyc files shouldn't have made it into the deb package at all. Prior to making headway on deterministic builds, we need spend some time improving the linting around the current package building logic to conform to best practices. Then we'll be in a much better position to seriously attempt deterministic building.
https://github.com/freedomofpress/securedrop/issues/1472 should also simplify deterministic builds.
Just following up here, the *.pyc
files in 0.3.10 was definitely a regression—looking back at older deb packages:
Found .pyc files in securedrop-app-code-0.3-amd64.deb
Found .pyc files in securedrop-app-code-0.3.1-amd64.deb
Found .pyc files in securedrop-app-code-0.3.2-amd64.deb
Found .pyc files in securedrop-app-code-0.3.3-amd64.deb
Found .pyc files in securedrop-app-code-0.3.10-amd64.deb
On the Debian FAQ for ReproducibleBuilds: Will Ubuntu use "reproducible builds" as debian is planning to do? Response circa 2013:
With very few exceptions, nearly all of Debian's work on this will just be going into the packages that form part of the package build toolchain, and as such Ubuntu will inherit it over the natural course of merging and syncing packages from Debian. The possible exceptions are things like the proposed libfaketime etc. preloads that we might insert into builds; I'd certainly be keen to keep up to date with things Debian does in this area, not just to protect against intrusion but also because there are immediate practical benefits to doing so (safer multiarch handling).
This came up in a chat on gitter, but it might be worth using docker containers for reproducible builds. One file that looks simple (basically just shell) and gives us the advantage of caching of build steps which is hugely annoying for testing anything related to deployment. The current turnaround for testing a change to postinst
to know if it did the right thing is... 5 minutes? Maybe longer. Unless things have changed, Signal does this. Docker is already a part of our dev cycle, and we're trying to phase out Vagrant.
Thanks to @redshiftzero's work on https://reproduciblewheels.com/, all wheels for by the securedrop-app-code package can now be reproducibly built using the following changes to the build configuration:
SOURCE_DATE_EPOCH
https://github.com/redshiftzero/reproduciblewheels/blob/main/check.py#L38This now unblocks the ability to provide reproducible builds for securedrop-app-code debian package from source files.
thanks to @conorsch for pointing out that passing a constant --build
dir removes the final source of non-determinism for our wheel builds!
It should be possible for independent folks to compile SecureDrop from source and achieve exactly the same binary. Once this is possible, SecureDrop's normal release process should rely on multiple independent builders.
For added safetly, the normal release process should also expect that others are secretly building SecureDrop the same way, and provide a mechanism for issuing alerts the official builds are incorrect. This mechanism should be tested regularly (but not often). In a test, all the official builders collude to make an innocuous change (perhaps the addition of whitespace), and publish their build. Then you see how long it takes for someone to sound the alarm.