fosslinux / live-bootstrap

Use of a Linux initramfs to fully automate the bootstrapping process
290 stars 26 forks source link

RFC: Using VCS snapshots #177

Open fosslinux opened 2 years ago

fosslinux commented 2 years ago

Despite the best of our ability, sometimes pregenerated files manage to slip through the cracks, such as with coreutils 5.0 today, there's been a couple other cases too.

Perhaps it would be a better idea to source all of our distfiles from VCS (normally git) snapshots?

Git snapshots are less likely to have pregenerated files generally speaking. They will obviously still have some in some cases, but there is less margin for manual auditing error not finding pregenerated files.

Thoughts?

schierlm commented 2 years ago

I'd agree. You can also use a script similar to https://github.com/schierlm/FullSourceBootstrapFromGit/blob/main/check-swh.sh to check that these git/svn repos are indeed archived by softwareheritage.org. So even if the git repo goes away, there will be a secondary source. (NB the script contains a few bashisms like associative arrays, and it would probably be easier to rewrite in Python than remove those bashisms).

Another option would be to generally consult a diff against VCS when manually auditing new packages.

Yet another option would be to generate Makefiles and check if there are any makefile targets for files that already exist. Or check what is deleted by targets like make maintainerclean (e.g. by invoking them and diffing the result).

stikonas commented 2 years ago

I don't have strong opinion. If we decide to go with git snapshots, I guess it's fine. Although, the main pregen file offenders are usually GNU tarballs, rest are usually fairly good (mostly need just autoreconf -fi.

With git snapshots there is a risk of non-content changes, e.g. after remote server upgrade, git snaphots might be generated with newer gzip/tar and potentially have different checksum (not sure if that happens in practice).

But at the very least, it's probably a good idea to run every build step via using git snapshot at least once (manually if we don't switch in the end). That way we should catch most of the remaining pregen issues.

nanonyme commented 1 year ago

As long as git snapshots are obtained through git, sounds fine. In general using systems like GitHub to generate downloadable archives is really fragile as we have noticed they want to preserve right to change git archive generation algorithms which can result in checksum changes without much of a warning.

nanonyme commented 9 months ago

There's also the concern of whether it will still be possible to handle the "this needs to be downloaded without HTTPS" cases if there's switch to VCS snapshots.

fosslinux commented 3 months ago

https://www.openwall.com/lists/oss-security/2024/03/29/4

xz was compromised by a file added in the release tarball. This was not an "autogenerated" file. VCS tends to be more well-audited... perhaps a point towards VCS.

nanonyme commented 3 months ago

That said, XZ tarball was compromised through maintainer adding code into VCS that made tarball creation code generate compromised tarballs. Perhaps more of a point to add second layer of hashing where you hash every single file inside tarball so you know exactly which files change and ensure all used source code is reviewed downstream.

nanonyme commented 3 months ago

The XZ story lesson is fundamentally not to trust upstreams where security matters.