NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.44k stars 13.65k forks source link

Strip NuGet.org signatures from downloaded nupkg files #326345

Open Smaug123 opened 2 months ago

Smaug123 commented 2 months ago

Issue description

NuGet.org alters packages during upload by unzipping, adding a signature file to the zip, and re-zipping. This means no local Nix build of a NuGet package can possibly match the version pulled from nuget.org.

https://github.com/NixOS/nixpkgs/blob/8be517b5c9817f2122f27cf8000dda859747adc0/pkgs/build-support/dotnet/make-nuget-deps/default.nix defines how we pull NuGet dependencies.

Would you accept a PR that verifies the downloaded signature if possible (I don't yet know what dotnet nuget verify does, but it may be possible to get it to work in the sandbox), and then strips the signature file signature.p7s from NuGet.org-downloaded packages before writing the nupkg to the store? That way, in principle in a content-addressed Nix world, it would become possible to locally (reproducibly) build a NuGet package (after adjusting for the fact that NuGet packs are nondeterministic due to zip file timestamps) and have the result be equal to the one that fetchNuget gives you.

(This is step 1 of my plan; step 2 would be "make dotnet pack reproducible, either here or upstream", and step 3 would be "demonstrate that a content-addressed Nix can skip downloading from NuGet if I've already built my nupkg locally".)

Maintainers

@corngood is listed in CODEOWNERS; @tie has a bunch of commits that look .NET-specific and therefore might be interested.

corngood commented 2 months ago

I have a big change I've been working on that is essentially doing your step 3, and maybe some of step 2?

What I have:

The branch is here if you want to take a look. It's still a bit rough: https://github.com/corngood/nixpkgs/tree/dotnet-unpacked-packages

I've learned quite a bit about manually packing and unpacking packages, package directories vs. sources, restore vs tool restore vs paket, etc, etc.

Ideally I'd like to get to the point where we can just automatically configure something like NUGET_FALLBACK_PACKAGES and not have any .nupkg files in the nix store at all, but there are various quirks in the tools that are blocking that.

One consideration with repacking nupkgs is that unless they are uncompressed, runtime dependencies will be obfuscated. In my branch I've just kept the .nupkg files in the same outputs as the unpacked packages, which is now NUGET_PACKAGES is normally structured. It's also an annoying bit of redundancy.

As for your step 1:

then strips the signature file signature.p7s from NuGet.org-downloaded packages before writing the nupkg to the store

I haven't done any work on this yet, but it's something that's been on my mind. If we could get that done, I think it would simplify and speed up nuget-to-nix considerably, since it wouldn't have to search for a deterministic package source. It could even allow mirrors to be determined.

Smaug123 commented 2 months ago

Cool - it would cause mass rebuilds, but naively I'd just edit make-nuget-deps to do that manual unzip-and-rezip dance to strip out the nondeterministic signature files. I'll give that a go and see what it looks like.

corngood commented 2 months ago

Do you know if it's just the .signature.p7 file you need to strip out for it to be deterministic? I had to strip those out when patching packages because it's not valid when the package is repacked, so it would be nice if they were already gone in the fixed-output derivation.

Do you have any real examples of the same package existing on multiple sources? In my experience it happens most on the packages with lots of sources in nuget.config. roslyn might be a good one to try. You could modify nuget-to-nix to always check all the sources for a package (instead of stopping on the first that has it), and compare them all.

Smaug123 commented 2 months ago

dotnet pack is nondeterministic because of zip file timestamps; in my experience that's its only nondeterminism. I am fairly sure dotnet nuget push to NuGet.org is nondeterministic solely because NuGet.org adds that signature file.

I don't have any examples of packages existing simultaneously in multiple sources other than the trivial "I built this package locally" vs "I uploaded this package to nuget.org". (I don't think I've ever actually operated with multiple NuGet sources except in rather contrived situations where I was fully in control of the contents of most of the sources.)

corngood commented 2 months ago

If you're not able to find one by the time you have something to test, I'll dig something up.

dotnet pack is nondeterministic because of zip file timestamps;

In my branch I'm planning on using strip-nondeterminism for that. I'd actually like to make a general purpose hook, which could be used on both fetched and built packages.

This is a bit off topic, but my idea for that is something along the lines of:

  1. install (e.g. dotnet pack) puts .nupkg file in $out/share/nuget/packages
  2. preFixup hook unpacks all .nupkgs into NUGET_PACKAGES compatible structure
  3. regular fixup can run and patch shebangs, shared libs etc
  4. postFixup hook (optionally?) re-packs the nupkg
Smaug123 commented 2 months ago

By the way, the reason dotnet pack is nondeterministic is that when they made it so, they hit an absolutely hilarious error case which even I, jaded after six years of dealing with NuGet, would never have imagined. https://github.com/NuGet/Home/issues/8603

Basically, we should check empirically that the nondeterminism fix doesn't break dotnet restore for computers which are in time zones at a positive offset from UTC.

Smaug123 commented 2 months ago

dotnet nuget verify is using a frankly hilarious amount of CPU when I put it immediately before the "strip the signature file" operation. I imagine it'll also fail to verify in a sandboxed environment. Think it's OK to just skip any signature verification?

corngood commented 2 months ago

it could be done in nuget-to-nix. once the hash is locked there, there should be no need to check it again. nuget-to-nix is going to need to figure out the deterministic hash of the package one way or another anyway.

It might work offline if it has embedded public keys. I'm not sure how it works exactly. I'm not sure it's even reliably checking signatures currently. When I stripped the signatures I was able to build all of dotnet 8 VMR without any problem, but dotnet 9 complained about an invalid signature somewhere late in the build.

Smaug123 commented 2 months ago

In my mind the obvious place to put this would be straight in fetchNuget (which is defined in make-nuget-deps/default.nix). No existing hashes in deps files would change (I think we'll need the original hashes anyway for fetchurl?). I'd be relying on content-addressed derivations to short-circuit the "download from NuGet and remove signature file" work if you've built the package locally; I need to check that this does in fact short-circuit.

corngood commented 2 months ago

Sorry, I'm not all that familiar with content-addressed nix, so I think I slightly misunderstood how problems relate to each other.

I'll try to explain the problem I've been trying to fix in some more detail:

In this situation dotnet restore during fetch-deps will non-deterministically (see e.g. https://github.com/dotnet/source-build/issues/4206) use the package from either A or B, so the unpacked package in $NUGET_PACKAGES may vary.

nuget-to-nix looks at $NUGET_PACKAGES and tries to write a fetchNuget for each one. Even though it has the .nupkg for Foo, and knows which source it came from, it still has to attempt to download the package from each source, and use the first one (deterministically) that contains it.

If we could make fetchNupkg output source-independent (perhaps by just removing the signature?), we could make it a fixed-output derivation. Then nuget-to-nix would need to determine the fixed-output hash, but it could do so with the .nupkg from $NUGET_PACKAGES. It would likely still need to check which sources contain the package in order to write the url/mirror list, but it wouldn't have to download anything.

I guess this is more complex than what you were thinking, but it should also solve the CA problem.

Smaug123 commented 2 months ago

Right, I see. I believe my proposed change (see the draft #326785) does make fetchNuget source-independent (though I haven't tested that empirically yet), so what you're looking for could be built on top of it. We'll find out!

Smaug123 commented 2 months ago

Yup, just tested that empirically:

Then consumed it using the nixpkgs at https://github.com/NixOS/nixpkgs/pull/326785 (commit a6e32ac9bf59628fb17359d8f448161213d47cde), and observed that indeed the version that made it through fetchNuget is identically equal to the one I pushed:

> sha256sum /nix/store/llilh0vaf5n20ir22znb5zpaqab5y82a-WoofWare.Test.Thing.0.0.3.nupkg
f503214402c5a73e6e95811b39b58c06d9a189fd4dc06f1237a4708fb9236aff   /nix/store/llilh0vaf5n20ir22znb5zpaqab5y82a-WoofWare.Test.Thing.0.0.3.nupkg

So as far as I have been able to test so far, that PR does indeed make the NuGet package source-independent.

corngood commented 2 months ago

I still haven't been able to find any practical examples in nixpkgs of packages that are available from multiple sources. I think some of the places where it used to happen are now using packageSourceMappings.

corngood commented 2 months ago

Ah, here we go. In case this proves useful for something:

  (fetchNuGet { pname = "Microsoft.Azure.WebJobs"; version = "3.0.39"; hash = "sha256-Ndym81F4BFIHsH4NF/weySuRdWgzQQM8hHsjTW8c1vs="; })
  (fetchNuGet { pname = "Microsoft.Azure.WebJobs"; version = "3.0.39"; hash = "sha256-DLFcCFeOQRMnZzPjJlQZEyVXX1iey/QTQKcT1br2q1I="; url = "https://azfunc.pkgs.visualstudio.com/e6a70c92-4128-439f-8012-382fe78d6396/_packaging/21e57804-e42a-44f4-a801-493faaf56251/nuget/v3/flat2/microsoft.azure.webjobs/3.0.39/microsoft.azure.webjobs.3.0.39.nupkg"; })
  (fetchNuGet { pname = "Microsoft.Azure.WebJobs.Core"; version = "3.0.39"; hash = "sha256-R8WFHImU5uctfCCF0cTzBlu+4QORk14Wku5tutxVzyI="; })
  (fetchNuGet { pname = "Microsoft.Azure.WebJobs.Core"; version = "3.0.39"; hash = "sha256-n1yL6u3IYK82N1zWCt9g5oqIRteX/tOI67z0FH8gR/k="; url = "https://azfunc.pkgs.visualstudio.com/e6a70c92-4128-439f-8012-382fe78d6396/_packaging/21e57804
corngood commented 2 months ago

I also confirmed that the first package there has the same content when you remove the signature file from both versions.

Smaug123 commented 2 months ago

Next up: