devops-works / binenv

One binary to rule them all. Manage all those pesky binaries (kubectl, helm, terraform, ...) easily.
MIT License
375 stars 44 forks source link

Add checksum validation of downloaded archives #205

Open ppetr opened 2 years ago

ppetr commented 2 years ago

While downloading from GitHub via HTTPS gives a reasonable level of security, I'd still prefer to have the binaries verified against their respective checksum files.

I propose to add a field with a URL to a checksum file together with a checksum of the file itself. Example:

  foo_binary:
    fetch:
      url: https://github.com/...
    checksums:
      type: sha256  # Hash type used in checksums.txt as well as in `hash` below.
      url: https://github.com/foo/bar/releases/.../checksums.txt
      hash: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

After downloading an archive for foo_binary, binenv would also download the checksums.txt file, verify its integrity, and then verify the integrity of the archive against the appropriate checksum in checksums.txt.

An alternative would be to include all the hashes in distributions.yaml itself, but I think it'd be way too verbose.

I'm happy to contribute a PR once an agreement is reached on the details.

leucos commented 2 years ago

Hello @ppetr

That would be great.

However I've left this aside for now because I fear there will be a lot of nitty gritty details (checksums for compressed binaries, checksums inside tarballs, non standard checksum file formats, ...).

If you want to tackle this please go ahead. The above proposal seems fine to me. Keep us informed !

ppetr commented 2 years ago

Great! Let's keep it simple, just to verify checksums of tarballs as they're already provided. Later we can think of expanding it, if needed.

I'll then start working on a prototype and I'll keep you updated 🙂.

ppetr commented 2 years ago

After some experimenting I came to the following ideas, which can be implemented relatively independently.

Provide a checksum of a checksum file for each released version. This is the simplest solution and probably easiest to work with for authors/maintainers, but it's a bit more verbose in the distribution file.

fzf:
  description: fzf is a general-purpose command-line fuzzy finder.
  url: https://github.com/junegunn/fzf/
  list:
    type: github-releases
    url: https://api.github.com/repos/junegunn/fzf/releases
  fetch:
    url: https://github.com/junegunn/fzf/releases/download/{{ .Version }}/fzf-{{ .Version }}-{{ .OS }}_{{ .Arch }}.tar.gz
  integrity:
    url: https://github.com/junegunn/fzf/releases/download/{{ .Version }}/fzf_{{ .Version }}_checksums.txt
    checksums:
      - url: https://github.com/junegunn/fzf/releases/download/0.30.0/fzf_0.30.0_checksums.txt
        type: sha256
        checksum: 43cc37783e0bf4ed775109379b3e2073ea2bb29c9e4811d07907c868435e1b7e
      # Other versions follow.
  install:
    type: tgz
    binaries:
      - fzf

Use OpenPGP to sign chechsum files. This is often done by authors that already use PGP.

A random example: https://github.com/orgrim/pg_back/releases/tag/v2.1.0. The release includes the checksums.txt and its signature checksums.txt.asc. Then it'd be enough to provide the public key(s) of the author(s) once and use it to verify any of their releases:

  integrity:
    url: https://github.com/junegunn/fzf/releases/download/{{ .Version }}/fzf_{{ .Version }}_checksums.txt.sig
    public_key:
      # This would require https://gopenpgp.org/, probably a more heavy-weight library.
      openpgp:
        - |
          -----BEGIN PGP PUBLIC KEY BLOCK -----
          ...
ppetr commented 2 years ago

So my questions are:

  1. Are these options (or one of them) reasonable to implement?
  2. For OpenPGP is it acceptable to add this non-trivial dependency? We could also resort to calling gpg externally, but that feels to me a bit against the spirit of binenv which is otherwise very self-contained.

    Another interesting alternative could be saltpack, but its drawback is that it's very new, so its adoption rate would probably be much smaller.

leucos commented 2 years ago

Well, I think I did not understand your initial proposal.

I thought you wanted to grab the checksums (when they existed) from the released artifacts and compare to what has been downloaded by binenv install.

But it seems you'd like to add checksums for all versions in distributions.yaml.

I do not think we should be the custodians of fingerprints, this is too much responsibilities (and also, quite a chore to maintain; try a make e2e in the repo and you'll feel the pain).

So I am not convinced we're heading the right way here.

ppetr commented 2 years ago

I see your point.

From reliability perspective, checking against checksum files in releases might help a bit, but I guess modern https is very good already in ensuring reliable transmission. And since files are always compressed, their internal integrity is verified by mechanisms such as CRCs built in decompression algorithms.

My perspective is rather security. Imagine let's say a GitHub account of a very popular binary becomes compromised. An attacker can replace/create a release with corrupted binaries as well as matching hashes. Then thousands of computers will became infected by malware.

I agree that maintaining checksums of individual files is just not maintainable.

Then how about authors' PGP public keys and/or fingerprints? This means adding just one string once for every eligible project that won't change over time (or extremely rarely). This information could be even scraped automatically for example from project README.md files (if present). But with an important feature that they'd never be changed by automation once present. Then:

That way a reasonable level of security can be reached and hopefully with little intrusion.

leucos commented 2 years ago

Interesting. Do you have examples of such signed releases ?

ppetr commented 2 years ago

Let me give a couple of examples:

The checksums are either given as a single file (usually .txt.asc) which contains both the original checksums as well as the signature, or as a pair of files (usually .txt and .txt.asc), where the latter contains a detached signature of the former. More details can be found here: https://www.gnupg.org/gph/en/manual/x135.html

I also found out that the fingerprint of the signer's key can be extracted from a signature (https://security.stackexchange.com/q/62916/12485). Which means it'd be possible to collect these fingerprints from all projects that contain such a signature without requiring the authors to publish the key somewhere, making the process even more seamless.

leucos commented 5 months ago

I understand and that's really interesting. However there are a lot of devilish details to handle (gpg availability on the system being one of them, fetching hashes, keys, ...).

It is quite a piece to do, and will only apply to a very small subset of distributions.

If someone wants to take this, that's fine with me. But I am not sure that with the time in my hands I can do this right now.

I feel that some kind of autodiscover feature for repo should be written (after having to add and debug almost 300 entries in the distributions file, you really feel the urge for this tool). May be it could handle this and we could adapt the sources struct to handle signatures.

But again, time is lacking on my side, sorry.