Open Micket opened 1 year ago
If you do this, you are vulnerable to malformed tarball CVEs (like nix is), and EB does not have a sandboxed environment to mitigate this risk (somewhat) like nix does.
So, please don’t do this in EB.
see https://github.com/orgs/community/discussions/45830#discussioncomment-4824886
If you do this, you are vulnerable to malformed tarball CVEs (like nix is), and EB does not have a sandboxed environment to mitigate this risk (somewhat) like nix does.
So, please don’t do this in EB.
see https://github.com/orgs/community/discussions/45830#discussioncomment-4824886
That's a valid point of course, since we basically need to start playing with an unchecked tarball (unzipping/unpacking it) before we can actually obtain a checksum for it...
Does that mean we'll just need to suffer the pain if the tarballs served by GitHub start changing (again, since this also happened in Sept'17, see https://github.com/easybuilders/easybuild-easyconfigs/issues/5151 + https://github.com/spack/spack/issues/5411)? :-/
Please also know that it the way we've done it in Nix isn't amazing. It would be better to not do it this way. We would much rather know that the archives produced by GitHub are hash-stable over time.
checksums = ['nar:a_narred_sha256sum', 'a_normal_sha_256sum']
Side note: This syntax would required both checksums to match. So we'd need a tuple and likely better invert the order, so the tar checksum is checked first and only on failure the NAR is used.
I understand Todd's point about a vulnerability, but regardless of how the Github issue evolves, having a way to check whether the content has changed seem to go through NAR hashes. Implementing it as a second step, after having checked the tarball's checksum, would get us one step closer to being ready for a possible future change in Git.
I don't see the downfall (other than a bit more time to build stuff) to add NAR hashes to reduce our work in the future. If checksums from github change down the line, we will be one step closer to being able to validate that new archives are indeed the same as previous ones.
Note that I would not automatically use the NAR one if the tarball's hash fails. I would error out, provide an expert flag to do check the NAR and suggest a way to add a secondary checksum for the tarball through a PR if the NAR matches.
Also, moving checksums outside of EasyConfigs, into checksums.json
(which admittedly would need to become a bit more complete), would make an eventual future massive update of checksums a lot easier to handle (because it would require updating a lot fewer files).
checksums = ['nar:a_narred_sha256sum', 'a_normal_sha_256sum']
Side note: This syntax would required both checksums to match. So we'd need a tuple and likely better invert the order, so the tar checksum is checked first and only on failure the NAR is used.
That doesn't seem wise taking into account @tgamblin's very valid remark above, to put it mildly.
If we get the wrong SHA256 checksum, either it's a benign change (like GitHub suddenly serving a different source tarball, but with the same contents), or something malicious is going on (someone manages to serve you a zip bomb via MITM). In the latter case, you most definitely don't want to unpack the tarball to compute the NAR hash, I think it's clear now that that would be a mistake.
That doesn't mean a NAR hash is entirely useless: if we have two source tarballs that have a different SHA256 checksum, and we're pretty sure they're both safe (both served by GitHub), we could use a function that computes the NAR hash (in an isolated environment) to easily check whether the contents of the tarball are still the same.
I'm not sure that's a good enough reason to implement support for checking a NAR hash though, since the output produced by diff -ru
is usually more informative, since it not only tells you whether there is a change, but also what has changed (see https://github.com/easybuilders/easybuild-easyconfigs/issues/17233 for example).
I'm not sure that's a good enough reason to implement support for checking a NAR hash though, since the output produced by
diff -ru
is usually more informative, since it not only tells you whether there is a change, but also what has changed (see easybuilders/easybuild-easyconfigs#17233 for example).
diff -ru
implies that you still have the "original" copy, which may not be the case. Computing the NAR hash and storing it would allow you to validate with confidence without needing to have the old archive.
Computing the NAR hash and storing it would allow you to validate with confidence without needing to have the old archive.
I think this is a nice benefit because you don't have to maintain a mirror (or fetch from it). I added number 5 here after @mboisson mentioned it. You still have to keep around two hashes though, and you have to train your contributors to NAR stuff (or automate that in CI)
The way I could see NAR hash used is:
--inject-checksums
& co)diff -ru
to figure out what has changedOne potential safe (?) use of NAR hashes could be stuff that we download via git_config
, since then a SHA256 checksum of the tarball is not possible (cfr. #2726)
Summarizing discussion from slack:
Background: Changing datestamps in auto-generated tarballs. Change in compression level (like github did recently, but reverted). Change in tar.gz vs tar.xz. Sources from
git_config
lacking stable checksums. Our checksum implementation is lacking.NAR Hash would solve this by serializing the file tree and its contents ( https://gist.github.com/jbeda/5c79d2b1434f0018d693 )
Ref. https://nixos.wiki/wiki/Nix_Hash
I assume this could even be made opt-in? If sha256sum is used, then apply to binary (still relevant in many cases, where we e.g. have iso-files for MATLAB etc.). i.e.
Ref. https://github.com/easybuilders/easybuild-easyconfigs/issues/5151