QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
534 stars 47 forks source link

Save and publish info about build environment needed to reproduce package builds #4611

Open HW42 opened 5 years ago

HW42 commented 5 years ago

With the recent changes (1, 2, 3) rpm and deb packages are reproducible when build with same dependencies.

For users to be able to check published packages we need to save and published the information about the build environment which is needed to reproduce a package (Currently only the packages installed to satisfy the build dependencies, I think).

For debs dpkg generates .buildinfo files which contains those information. There's https://buildinfo.debian.net/ where we can upload them. But since this is currently in a PoC state we should also save a copy somewhere else. I think another directory on deb.qubes-os.org should be good enough.

For rpms there's no upstream file format yet. mock generates a installed_pkgs.log which can be used as base.

HW42 commented 5 years ago

Another option is to make the build not depend on the state of the package repositories during builds. For this we would need to commit the list of installed packages used to build a packages.

This would be a potentially controversial change since it breaks a bit with how "packages" work in classic distros. Also we would have those packages lists lying around in the sources and those needs to be updated (which would be of course scripted).

OTOH this approach would also have some advantages:

@marmarek: What do you think?

marmarek commented 5 years ago
  • Build and rebuild would be the same operation.

  • The build infrastructure will generate the same packages as has been tested locally (if the package has no reproducibility bug).

Those two are really cool.

But there is also disadvantage: since for each distribution the list will be different, you need all of them to commit new version. This make the development more cumbersome (even if it's scripted, it will take time and disk space), but also enlarge attack surface on the development environment - now having any of supported distribution compromised would allow to influence source code (even if that's limited to the dependencies list). Additionally, it makes harder to build for not officially supported distributions (like @unman build for Ubuntu, or @fepitre build for not yet supported Fedora versions) - this basically means that the old approach (collect newest packages from repo) would still need to be around. And that those not-officially-supported distros would not benefit from reproducible builds.

Anyway, we can still use something like this for templates and ISO. Those two are more focused on specific distribution. For Debian templates, IMO it should be enough to specify snapshots to use (we need snapshots in qubes repositories too). Then log packages used to build anyway (as we currently do).

HW42 commented 5 years ago

But there is also disadvantage: since for each distribution the list will be different, you need all of them to commit new version. This make the development more cumbersome (even if it's scripted, it will take time and disk space), but also enlarge attack surface on the development environment - now having any of supported distribution compromised would allow to influence source code (even if that's limited to the dependencies list).

Right.

When we just refer to a snapshot of the normal repository instead of having a list of all used packages this would be much less of a problem. An update would be much simpler and don't need the distro to be installed. Something like this:

-20181203T153211Z
+20181215T041424Z

Snapshots also have a few disadvantages compared to a full list:

The later two points are probably not so important.

Additionally, it makes harder to build for not officially supported distributions (like @unman build for Ubuntu, or @fepitre build for not yet supported Fedora versions) - this basically means that the old approach (collect newest packages from repo) would still need to be around.

My plan would be to build a local repo with the downloaded packages. So for non-official or test builds it would be easy to replace the link to the local repo with the normal online repo.

And that those not-officially-supported distros would not benefit from reproducible builds.

Since those are AFAIK not released, reproducible builds are probably less interesting there.

marmarek commented 5 years ago

Since those are AFAIK not released, reproducible builds are probably less interesting there.

Well, if I could verify integrity of packages build by external contributors (for example https://qubes.3isec.org/), that would be very useful. From my perspective, this have even more value than for official packages, since I know and somehow trust the official environment, but have no idea about security of others. Generally reproducible builds would allow more distributed trust in build environments, especially when local verification on install time will be implemented (#2535).

My plan would be to build a local repo with the downloaded packages. So for non-official or test builds it would be easy to replace the link to the local repo with the normal online repo.

Makes sense. But this means "Build and rebuild would be the same operation" is less valuable. You simply shift the difference into version commit time, which gets more similar to "non-official or test build", as you need to collect build dependencies there for various targets (at least for non-Debian case). So, basically you generate "buildinfo" file at version commit time. This means we still need exactly the same elements (producing that file, distribute it, a tool to reproduce build environment etc), regardless if the "official builder" would or would not be a "rebuilder".

* For Fedora we don't get snapshots easily.

Yes. I think snapshots are supported for Debian only. Others include Fedora+CentOS, Arch, Ubuntu (not sure about snaphots of upstream repo?), and probably Gentoo soon too (here maybe commit id of portage repo will work?). Anyway, that's still different things for different distributions.

HW42 commented 5 years ago

Since those are AFAIK not released, reproducible builds are probably less interesting there.

Well, if I could verify integrity of packages build by external contributors (for example https://qubes.3isec.org/), that would be very useful.

As I wrote I wasn't aware that there are published binaries. This seems* to be indeed a rather strong argument against putting the resolved dependencies in to the source code repo, since incorporating external contributors would be complex.

*: Maybe there's a simple solution I'm not currently thinking of.

From my perspective, this have even more value than for official packages, since I know and somehow trust the official environment, but have no idea about security of others. Generally reproducible builds would allow more distributed trust in build environments, especially when local verification on install time will be implemented (#2535).

FWIW: I consider reducing the trust required in the official environment to be more important than being able to check builds from third parties. Anyway with reproducible builds we should get both.

Note that verifying third party contributions isn't that easy. You just get the part of source+builinfo -> package. You still need to check the source and the buildinfo (mostly verify that the installed package list is sane).

My plan would be to build a local repo with the downloaded packages. So for non-official or test builds it would be easy to replace the link to the local repo with the normal online repo.

Makes sense. But this means "Build and rebuild would be the same operation" is less valuable. You simply shift the difference into version commit time, which gets more similar to "non-official or test build", as you need to collect build dependencies there for various targets (at least for non-Debian case). So, basically you generate "buildinfo" file at version commit time. This means we still need exactly the same elements (

producing that file,

Well, yes, we need that info otherwise this ticket would be moot. So at some point we will need to generate it.

distribute it,

Not really. Yes it's distributed, but with the normal source, so no need to think of where to upload/mirror, how to sign, etc.

a tool to reproduce build environment

Yes indeed this is quite similar to what a "rebuilder" program needs to do. But I think the build-local-repo + upstream-pkgbuild would be easier to customize than integration of upstream-rebuilder-tools for every distro. But of course that's just guesswork, I implemented neither.

etc), regardless if the "official builder" would or would not be a "rebuilder".

* For Fedora we don't get snapshots easily.

Yes. I think snapshots are supported for Debian only. Others include Fedora+CentOS, Arch, Ubuntu (not sure about snaphots of upstream repo?), and probably Gentoo soon too (here maybe commit id of portage repo will work?). Anyway, that's still different things for different distributions.

FWIW, Arch Linux has a snaphot service quite similar to Debian's.


To summarize: Based on the "third-party packages" argument, I agree that it's probably better to keep the dependency resolution as-is for now.

marmarek commented 5 years ago

I've enabled reprepro option to include buildinfo files in the repository.

marmarek commented 5 years ago

I've plugged https://github.com/woju/rpmbuildinfo into builder-rpm and signed buildinfo files are now uploaded to yum repository (but not included in yum/dnf index). This is rather experimental solution. Next step would be to try sketching/writing rebuild tool and see what else is needed. If rpm buildinfo format would work, then we should upstream it, to be part of rpmbuild. As for distribution of buildinfo files, some solution is needed and including in yum/dnf repository is probably not the best one.