NuGet / Home

Repo for NuGet Client issues
Other
1.5k stars 253 forks source link

[Feature]: Add tracing to NuGet to identify all possible sources of package downloads #11782

Open ericstj opened 2 years ago

ericstj commented 2 years ago

NuGet Product(s) Involved

NuGet.exe, Visual Studio Package Management UI, Visual Studio Package Manager Console, MSBuild.exe, dotnet.exe, NuGet SDK

The Elevator Pitch

Component Governance is important to help identify potential vulnerabilities in a repository.

Today component governance detects direct references, but also side-effects of restore -- the assets file, packages downloaded, binaries in the output. These are all good indicators but sometimes are hard to trace to the root cause.

One example we've noticed that's exceptionally hard for folks to identify is eclipsed packages. NuGet will download packages in the graph if it cannot "know" at the time of the download if that package will eventually win or not. These packages will appear in a local packages folder and could be used by the repository, so they are flagged by component governance.

Take one example which we hit in some dotnet repositories. Imagine package C1 is vulnerable and flagged by component governance.

Project > A > B1 > C1
Project > D > B2 > C2

In this case there is nothing inside the repository's intermediates that tells the repo owner "why" NuGet downloaded C1. They could find the reference in other packages downloaded and try to walk back up to some package which does appear in intermediate assets files, but that's a ton of work and not really feasible for the majority of our users.

Can we make it crystal clear why a package was downloaded? Potentially through opt-in tracing? NuGet knows this information and can make it available through some simple logging. Doing so would save repo owners everywhere many hours of time and make the ecosystem better by helping folks make more educated decisions in their repos around dismissing issues, working around them, or asking dependencies to update.

Additional Context and Details

No response

nkolev92 commented 2 years ago

Adding partner labels as some of the dotnet partners have been affected by this.

wtgodbe commented 2 years ago

👀

nkolev92 commented 2 years ago

For reference, this is where the unresolved packages are downloaded: https://github.com/NuGet/NuGet.Client/blob/2d93510efe2537fb26b3d1057a05e0cc9a6f07c5/src/NuGet.Core/NuGet.Commands/RestoreCommand/RestoreTargetGraph.cs#L166-L169

nkolev92 commented 2 years ago

A different solution to the original problem could be an opt in to avoid installing rejected packages. A means to avoid even considering rejected packages would be to use lock files.

aortiz-msft commented 2 years ago

Another alternative would be to create a separate folder, or a sub-folder, for packages downloaded to determine the graph but are not actually used for building the code.

ericstj commented 2 years ago

Not looking for alternatives unless those are easy to use diagnostics. The goal here is to help folks easily reason about where packages are coming from. It should be a simple flag to use and should tell them all the possible projects where a package might have come from.

mfwilson commented 1 year ago

I've put together an implementation we're using internally that relies on "project.assets.json" to determine dependency paths to projects as described above.

Example of back tracing "System.Memory" to projects: image

I'm curious if this feature is in progress or if this is this worth putting up a PR for?

zivkan commented 1 year ago

@mfwilson unfortunately it won't solve the problem from the original issue.

One example we've noticed that's exceptionally hard for folks to identify is eclipsed packages.

During graph resolution, NuGet will download all versions of all packages listed as dependencies for all packages (but only for relevant TFMs that the project will use). Afterwards, NuGet does the version selection, and graph trimming, which is what becomes the final resolved graph that gets written to the assets file.

Here's an example scenarios (the original issue explains a slightly different variation):

As described above, during the first stage of graph resolution, NuGet will download A 1.0.0, B 1.0.0, C 1.0.0, and B 2.0.0. During the second stage, version 2.0.0 of package B will be selected, and since no package lists package C any longer (and it's not directly referenced by the project), C will be trimmed out of the graph, so the assets file will only list A 1.0.0 and B 2.0.0.

Package C 1.0.0, with a known vulnerability, will be in the global packages folder, so tools like Component Governance will flag potential security issues. But package C isn't listed in the assets file at all.

mfwilson commented 1 year ago

@zivkan Thanks for the detailed reply.

Ok, I think I may be looking for a slightly different feature from the one described above.

My usage pattern of interest:

  1. Starting from the command: dotnet list <SOLUTION> package --include-transitive --vulnerable
  2. The next step is to trace the paths from package vulnerabilities back to impacted projects

The difference boils down to whether the usage is package cache centric or code centric. This feature is package cache centric which makes sense from a nuget CLI perspective.

I need to go look for tracing features for dotnet list I suppose. I'm more interested in knowing what my projects ARE using, not what they aren't. I'm no security expert, but given a situation where my projects are not using any packages with known vulnerabilities, but packages with vulnerabilities are downloaded during dependency resolution, what exactly is the potential harm?

zivkan commented 1 year ago

You're making an assumption that your projects are only using the packages/assets listed in the assets file. Tools like Component Governance doesn't, because the assets file doesn't track when a project file, or any MSBuild file (including an MSBuild file shipping in a package that you are using) does something like <ItemGroup><Reference Include="..\..\VulnerablePackage\lib\**\*.dll" />. Component Governance also isn't limited to NuGet, it tries to gather component information about package managers from all different language ecosystems, components (binaries? code?) checked into source control, packages brought in by custom/non-standard tooling. It's all about risk tolerances.

mfwilson commented 1 year ago

You're making an assumption that your projects are only using the packages/assets listed in the assets file.

I don't think I am, or at least that's not the intent. Component Governance sounds nice but sounds like it's getting hamstrung by partial information generated by NuGet here.

nkolev92 commented 1 year ago

Package C 1.0.0, with a known vulnerability, will be in the global packages folder, so tools like Component Governance will flag potential security issues. But package C isn't listed in the assets file at all.

I think this is something that will likely change. Re: Risk tolerance. It will be flagged differently.

JonDouglas commented 1 year ago

Just catching up here.

Seems there's a couple vehicles here.

  1. Extending dotnet list package further (Gives a better high-level tree view)
  2. Finalizing the dotnet nuget why work. (Gives a specific provenance of asset file but may be missing some functionality desired mentioned here)

https://github.com/NuGet/Home/pull/11875 / https://github.com/NuGet/Home/blob/dev/proposed/2022/dotnet-nuget-why-proposal.md

Is there another proposal / idea for tracing?

zivkan commented 1 year ago

@JonDouglas similar to my previous comments in this thread, neither dotnet list package, nor the proposed dotnet why will address the concern that Eric pointed out in the original comment in this issue:

One example we've noticed that's exceptionally hard for folks to identify is eclipsed packages.

The feature request here is something going a step further than what those feature provide/propose.