NuGet / NuGetGallery

NuGet Gallery is a package repository that powers https://www.nuget.org. Use this repo for reporting NuGet.org issues.
https://www.nuget.org/
Apache License 2.0
1.54k stars 643 forks source link

[NuGet.org Bug]: NuGet.org tampers with the uploaded package, making attestation difficult #10026

Open Smaug123 opened 3 months ago

Smaug123 commented 3 months ago

Impact

It bothers me. A fix would be nice

Describe the bug

I recently implemented GitHub Attestations for the NuGet packages I produce. This was pretty annoying to do, because NuGet.org inserts a signature file into any(?) nupkg you upload to it. So the only flow I can make work is:

This is awkward!

Repro Steps

Upload a package to NuGet. Download it again. Observe that its contents have changed.

Expected Behavior

It surprised me greatly that NuGet.org tampers with uploaded packages. (The obvious thing to do would have been to have NuGet serve a certificate separately from the package, but that ship has presumably long since sailed. "NuGet.org attests that this package was uploaded by someone capable of impersonating ThisParticularUser" is not information that belongs in the package; other package repositories do exist and it would be nice not to require each one to tamper incompatibly with the packages that get uploaded to it to attest to it!)

No solution comes to mind; it may be possible for NuGet.org not to tamper with uploaded packages, but I can't see how without either duplicating large swathes of API or breaking compatibility.

Screenshots

No response

Additional Context and logs

It is of course possible to sign NuGet packages, but this has exactly the same secrets-management problems that NuGet API keys have. The whole point is to attest meaningfully that this package was produced from this source; having to trust that package producers are correctly handling their certificates is a very big and unnecessary ask. (Of course, better would be to have reproducible NuGet packs, because then we could attest verifiably that a package was built from a particular source tree without a bunch of custom logic.)

Smaug123 commented 3 months ago

(This isn't really a bug, but I didn't think any of the labels were appropriate. "Design decisions are making things much harder than they needed to be".)

joelverhagen commented 3 months ago

Thanks for your comment @Smaug123. As you state, the ship has sailed on this decision and any deviation from this would be breaking one on sense or another.

In my opinion the package is not "tampered". It is "signed". Perhaps I am thinking of intent, i.e. accidental or malicious intent, for a word like tampering. For this discussion I think it would be important to separate intentional modification of an artifact from unintentional or malicious modification (which in fact is what our current package signing implementation mitigates the risk of). But this is semantics and I understand your point, I believe.

Would it be possible to re-use the hashing approach used by NuGet client that disregards the signature file during signature validation for your attestation purposes? Today, NuGet client needs to look at the signature file and confirm the package content is untampered. It of course can't use the entire .nupkg byte stream as the "bytes to hash". Instead it skips the signature file. Additionally, the signing process operates in a way such that the "content hash" (as it is called) is the same both before and after the signing operation.

Said another way, NuGet client already has an answer to this problem. A package that is packed, then author signed, then later repository signed by NuGet.org is modified two times in its life. The hash of the package content that is signed and then counter signed needs to be the same throughout this journey.

Perhaps you can attest to the content hash instead of the file hash? I understand this would be special casing NuGet but it could provide you a workaround and conceptual alignment with NuGet's signing design.

Reference code: https://github.com/NuGet/NuGet.Client/blob/a79a26abfe103fa7d75eba8c714a900c3a5bbc43/src/NuGet.Core/NuGet.Packaging/Signing/Archive/SignedPackageArchiveUtility.cs#L591

Additionally, it might help us understand your goal for attestation. What attestation tools or ecosystem are you using?

Smaug123 commented 3 months ago

My purpose in attestation here is to credibly claim that an open-source package has the contents it appears to have. I can claim that package Foo.Bar was built from Git source abc123 without modification, but (as far as I know) the NuGet ecosystem provides no way for me to make this claim credibly. It is possible in principle for me to use the existing NuGet signature mechanism to assert that I produced the package contents, but as far as I can tell there is no existing way for me to assert that the package contents are what I claim they are.

However, with GitHub's artifact attestations, I can credibly claim that this package was produced by this automated pipeline run from this git source tree which anyone can audit. GitHub's artifact attestations are of the form "an artefact with this SHA256 has this provenance", and as far as I can tell it does not admit special-casing for NuGet; it would take bespoke tooling to phrase "these packages are equal modulo bundled signature files" in this form.

Smaug123 commented 3 months ago

To be extremely concrete, I have some packages that I wrote myself which I wish to use in a context where the integrity of the code we're running is important: it's unacceptable for us to run packages for whose contents we have only my word. We could clone the packages from source, run parallel publishing pipelines in contexts that we trust, and so on; or we could remove the need to trust me, and instead only trust GitHub (which is much simpler).

joelverhagen commented 3 months ago

Thanks for the additional information.

In the NuGet signing "parallel universe" this credible claim you mention is analogous to NuGet author signatures. If you, the package author, use dotnet nuget sign I can credibly claim that the package was signed by the owner of that author signing certificate. To avoid trusting you, your organization (presumably the "we" in your last message) could entrust a code signing certificate to GitHub Actions (however you choose) and author sign the package. The package will be modified on NuGet.org but you can still verify the original author signature is valid. Of course you need to trust NuGet client tooling in this case.

A workaround for this difference in approach could be to simply not publish the NuGet package to a repository which supports repository signing (NuGet.org is the only I know of which does). For your concrete case, this could take the form of publishing the packages once to NuGet.org (if desired) for external/OSS consumption. Then you would publish the original packages to another feed, perhaps Azure Artifacts or GitHub Package Registry, and consuming from there for your highly sensitive context.

What's not clear to me is how you are performing the "verify" step of the consumed artifacts. Are you using both NuGet restore from NuGet.org and additionally gh attestation verify (per docs)?

Where is the verify happening in your consumption workflow? I assume you are using actions/attest-build-provenance@v1 against the .nupkg after dotnet pack in your GitHub Actions workflow.

Smaug123 commented 3 months ago

The problem with author certificates is that the organisation then needs to control the source repository. If I am administrator of the source repository, I can obtain the author certificate by reconfiguring the repository (e.g. by making the build environment temporarily available to ephemeral branches, which can't be meaningfully audited). All these problems are solvable with extra overhead (e.g. forking the repo into a controlled environment), but alternatively we could just not solve these problems!


Consumption workflow: nothing is yet set in stone, because GitHub Attestations themselves only went public a month ago and I found out about their existence yesterday. But the idea is that our privileged environments maintain a NuGet cache such that we can do a gh attestation verify as we populate the cache, and check various conditions implied by the gh attestation verify (e.g. "this new version is from the same source repo that we were expecting", "this was produced from a signed commit on a known release branch", etc). I haven't finished building the tooling that will actually perform this verification, but I expect it to be trivial: it's just a single curl to NuGet to download the package, an API call to GitHub (perhaps with gh attestation verify, but sha256sum | curl would probably do too) to get the attestation for that binary blob, and a bit of jq to verify the attestation against our expectations. Then later on we'll probably layer in some more GitHub calls on top to verify authorship/signing-of-commits etc, but that's much less immediately important to us than simply knowing the contents of what we're running.

It hadn't occurred to me to use GitHub Package Registry or similar - I'll investigate that, thanks. (But I will certainly continue attesting to the contents of what I put on NuGet.org now that I've worked out how to do it, because I find it extremely aesthetically pleasing that anyone can trivially verify them just with a command line similar to curl https://api.github.com/orgs/Smaug123/attestations/sha256:$(sha256sum ~/.nuget/packages/blah/1.2.3/blah.1.2.3.nupkg).)

joelverhagen commented 3 months ago

If you are populating and then consuming from an intermediate cache, then I imagine the source to populate the cache is less important.

If you don't consume directly from your cache, there is no guarantee the content fetched from the original source has the same attestation state at the time of package restore (dotnet restore). Therefore you'll need to set your NuGet package source as the cache.

This leaves just the matter of populating the cache, in which case NuGet.org doesn't seem to be unique. You just need some tooling (perhaps NuGet itself with an overridden NUGET_PACKAGES environment variable) to download your dependency graph from somewhere. Then the attest step. Then the nuget restore from your cache dir.

It seems like there are some open questions on how the end-to-end will work. It appears to me that non-trivial custom tooling is needed as well as new mappings (e.g. associating repository X with .nupkg Y). It's also not clear how transitive dependencies that you don't own will be attested.

I'll leave this issue open since it is indeed true, attestation based on trivial file hash is indeed hard to do with repository-signed NuGet packages.

Smaug123 commented 3 months ago

Yeah, the obvious proof-of-concept for this is the standard .NET workflow on NixOS, which fetches every package from a lockfile into an isolated environment and sets the NuGet source to that environment before build. This also asserts the contents of the package as a side-effect.

Transitive dependencies that I don't own are certainly important, and it would be great to see an attestation solution across the ecosystem! Some dependencies we're happy to accept on faith - e.g. FSharp.Core is already author-signed, and we are happy to assume its author is taking care of the certificate - but most dependencies are currently only dubiously acceptable or (e.g. in my case, an insider who is the sole author of an opaque binary blob) are completely unacceptable. That's why I was so excited to learn about attestations: it finally gives me the technology to prove that my binary blob corresponds to a given reviewable source, which makes it in principle acceptable to consume.

If I had god perms to the universe, I'd allow NuGet.org to associate packages with repos. Packages already can declare their source, and NuGet.org provides links to the source; it would (for example) be trivial for NuGet to check with GitHub at upload time that the uploaded package has been attested at the version it claims to have. (But for this feature to be useful enough to implement, it would require massive vendor lock-in to GitHub - I'm not aware of other ecosystems which allow such trivial attestation - and would require "most" package authors to have unusually secure/trustworthy publishing workflows. My impression is that most packages are released by hand, even, which by definition is a process that can't be meaningfully attested to.)

Smaug123 commented 3 months ago

The instability of NuGet package SHAs, by the way, is aesthetically displeasing for Nix as well. It should, morally, be the case that I can build a package from source and have the result be equal to the package I obtain from NuGet; if that were true and if dotnet pack were reproducible, I could build the package into my Nix store from source and then immediately consume it without waiting for a round-trip to NuGet.org to find a SHA to update.

joelverhagen commented 3 months ago

Even without package signing, I don't believe building .NET packages from source is byte-for-byte reproducible, but this is not an area I have investigated very much. I found https://github.com/dotnet/reproducible-builds via a Google search so perhaps you can give that a try. In general, I think byte-for-byte (i.e. hash equivalent) building of a package from source is not something that many customers have asked for so I don't have the experience of trying it and seeing what works and what doesn't.

erdembayar commented 3 months ago

Even without package signing, I don't believe building .NET packages from source is byte-for-byte reproducible, but this is not an area I have investigated very much.

In my opinion, this byte-for-byte reproducible approach may work in the short term, but most likely will not work after a few months. Most NuGet package dependencies are version ranges, so the package dependencies you have today may change at any time when older dependencies are unlisted and a new patched version is published. Of course, you can set a specific version and use a lock file to ensure you're using the same upstream package, but it requires quite an effort to make it happen for an end-to-end scenario. Many dotnet repositories that are a few years old are not rebuildable at all nowadays from source code.

Smaug123 commented 3 months ago

Yeah, NixOS's setup gives that stuff for free. I guarantee the "isolated" property of the Dotnet.ReproducibleBuilds.Isolated package by packaging with Nix, and I lock the SDK and all NuGet packages by hash at the Nix level for the build; it works pretty seamlessly. ContinuousIntegrationBuild=true and Deterministic=true appear in my experience to be enough to keep the output stable under those conditions, so it's only the timestamps in the .nupkg file that are nondeterministic.

Smaug123 commented 2 months ago

EDIT: I suspect I've found it, it's https://github.com/NuGet/NuGet.Client/blob/128a5066b1438627ac69a2ffe9de564b2c09ee4d/src/NuGet.Core/NuGet.Packaging/Signing/Archive/SignedPackageArchiveIOUtility.cs#L518 , although how NuGetGallery ends up calling this is a mystery to me.

Is there anywhere I can see NuGet.org's logic for adding this signature file? Empirically I have observed it to be the case that zip -d "$downloaded_package" .signature.p7s recovers precisely the original file, but there doesn't seem to be any particular reason to expect that would be true; in principle the whole thing could be recompressed when the .signature.p7s is added. I've completely failed to find the relevant logic in NuGetGallery. (It would be very helpful if we could reliably obtain the original package, given the one NuGet serves us.)

joelverhagen commented 2 months ago

Empirically I have observed it to be the case that zip -d "$downloaded_package" .signature.p7s recovers precisely the original file, but there doesn't seem to be any particular reason to expect that would be true

This can be true. zip -d would need to implement file removal in a specific way, as to not disturb the other file entries in a way that changes the NuGet content hash. If you are seeing that behavior, it is a fortunate coincidence.

For packages that have an existing .signature.p7s file, this would not work. This most often occurs when the package is author signed. In this case the .signature.p7s file is modified to contain a repository counter signature.

in principle the whole thing could be recompressed when the .signature.p7s is added.

This would be a breaking change because the stable "content hash" that is included in the signature (and is part of the NuGet package signing contract) would change.

I've completely failed to find the relevant logic in NuGetGallery.

NuGetGallery open source code will not show this part. We have a small closed source integration with Microsoft internal signing systems which provides repository signing. This closed source integration does internally use the code you linked (via MSSignCommand I think), but there are a couple of layers of abstraction between NuGet.org and the code link you found.

It would be very helpful if we could reliably obtain the original package, given the one NuGet serves us.

This is an interesting request and, if you want it, should probably be its own issue. Feel free to open this other issue and cross link it to this one! If I remember correctly, we do store the original artifact before repository signing so we could theoretically make this available for both existing and new packages. But there may be caveats that I don't know of right now, not found until such a feature request is designed and implemented.

Smaug123 commented 2 months ago

Thanks, that comment clarifies something that in hindsight was fuzzy in my mind. I care about determinism (a non-NuGet.org entity can recreate the package exactly), not about retrieving the original package per se; so stripping all signatures is fine for my purposes.

I dug through zip and, as long as the original package was unsigned, zip -d does seem to be simply inverting what NuGet does (although I can't say I fully understand that code). I think for nixpkgs we can work with what we have! (But only because in nixpkgs we get the chance to intervene before any consumer sees the nupkg; that's not really true in other contexts.)