HadrienG2 / hwlocality

Rust bindings to Open MPI Portable Hardware Locality "hwloc" library, covering version 2.0 and above.
MIT License
20 stars 5 forks source link

Consider downloading vendored library from a specific commit hash #89

Closed nazar-pc closed 8 months ago

nazar-pc commented 8 months ago

Supply chain attacks are real, it is always better to download a known good commit hash rather than moving target

HadrienG2 commented 8 months ago

It is indeed a good idea to harden the hwloc vendoring feature against supply chain attacks, but I think it's important to consider the full implications before committing to a particular technical choice:

  1. Correct me if I'm wrong, but as far as I understand it, the precise intent is to protect your app against a backdoor landing in the hwloc repository, either as a result of maintainer malice or a compromised github account. We should be confident that any adopted technical answer correctly mitigates this threat.
    • What gives you confidence that a backdoor isn't already present in today's hwloc version?
    • This commit hash will eventually have to be kept up to date as new hwloc functionality lands and I end up relying on it. What gives you confidence that I will notice a backdoor landing when updating the hardcoded commit hash? I certainly have neither the secure code expertise nor the maintainer bandwidth to carefully review hwloc changes for security when updating the hash, so I can only notice very obvious things like the whole repo being deleted and replaced with malicious code. But a hacker who has the expertise to get in the hwloc repo will likely also have the expertise to write much more subtle backdoors that I won't notice.
    • Are you confident that the extra protection this gives you against supply chain attacks, which are rare today, is worth the cost of being more vulnerable to unpatched vulnerability attacks, which are much more common? As far as I know, hwloc do not provide a security announcements mailing list, so I personally have no reliable way to know when the hash should be urgently updated for security reasons. Without such an official hwloc security notification channel, I would personally more readily trust Linux distro package managers to get this point right, as they have resources that I do not have like security experts ready to carefully review submitted package updates, and direct notifications from maintainers. So in my opinion a vendored release will always be a net loss from a security perspective...
    • All these things being considered, who do you think would be in best position to decide which hwloc version should be vendored? If the answer is "me", a hardcoded hash that I maintain is indeed a reasonable technical answer. If the answer is "you/your company", then the goal might be more efficiently achieved by you picking your own hwloc release and taking care of the security-sensitive update process yourself, rather than going through me as a slow middleman. If you end up concluding that you are best placed to pick the right hwloc release, then I can easily provide the required infrastructure for this (static linking to an hwloc release that you built yourself).
  2. Let us assume that point 1 is resolved by concluding that hardcoding the target vendored release is indeed a good idea and we want to do it. In this case, we must then consider whether a hardcoded git commit hash is the right technical approach. Intuitively, I would think that targeting a numbered project release tarball would be a better approach because...
    • As we discussed elsewhere, release tarballs have less build dependencies and thus give us faster builds and happier users.
    • Release numbers are the language that most of the security community speaks. If we decide to use them, rather than commit hashes, then we can more easily use security community resources like CVEs to answer questions like "is my vendored release vulnerable to vulnerability X or not".
    • While it might seem like git hashes provide extra download integrity checking, this can trivially be replicated in a tarball-based workflow by checksumming the tarball. And doing things that way gives us access to modern checksums that are more hardened against collision attacks than git's SHA-1, which has already shown several worrying warning signs of being insufficiently secure in this respect.
nazar-pc commented 8 months ago

What I'm trying to achieve is simple: when I install X version of the package and have it in lock file, I want to know that every time I install it, I get the same exact code and the same exact result.

The fact that every single time I compile it in CI I get potentially different version is strictly bad situation to be in. It may break in unexpected way, it may contain a backdoor. I don't want to worry about that, I want to know that I install exactly the same stuff every time.

Switching to tarballs is fine, but you need to not forget to implement a hashsum check to make sure package wasn't swapped with something unexpected.

HadrienG2 commented 8 months ago

If the main goal is build reproducibility, then I agree that a checksummed tarball is the right way to do it :)