NuGet / Home

Repo for NuGet Client issues
Other
1.49k stars 250 forks source link

No-op restore caches and replays NU1900 #13615

Open KirillOsenkov opened 1 month ago

KirillOsenkov commented 1 month ago

If you have a restore that failed with NU1900: Error occurred while getting package vulnerability data: Unable to load the service index for source...

You can then make a change to the NuGet Credential Provider, add the proper token, and re-run restore, it will still fail with the same error NU1900.

This is because the cache file is valid (since nothing changed on disk), so it doesn't attempt to re-restore.

This was an absolute nightmare to investigate (because when NuGet prints 401 you still think the problem is with auth, but really it doesn't even attempt to re-auth because it just plays back the no-op restore).

We also have warnings as errors, so the warning is logged as an error and fails the build.

KirillOsenkov commented 1 month ago

I think we shouldn't be caching failed restores at all

nkolev92 commented 1 month ago

No-op should not be caching failed restores: https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/NuGet.Commands/RestoreCommand/RestoreCommand.cs#L806C21-L806C30, which leads to https://github.com/NuGet/NuGet.Client/blob/38ab39a6691e6e5d54e29055d9e258b8b5d57220/src/NuGet.Core/NuGet.ProjectModel/CacheFile.cs#L43, so I'm guessing there's probably something else going.

How did you conclude it was a no-op caching issue?

KirillOsenkov commented 1 month ago

I debugged and saw that it considers everything up-to-date and doesn't attempt to rerun the restore

KirillOsenkov commented 1 month ago

When you say noop should be caching failed restores, that's what I think is happening, but this is wrong behavior. If the restore failed, we should delete the cache.

nkolev92 commented 1 month ago

When you say noop should be caching failed restores, that's what I think is happening, but this is wrong behavior. If the restore failed, we should delete the cache.

My bad, typo. It should not. The cache file has a status in it, which tells us not to cache it.

I debugged and saw that it considers everything up-to-date and doesn't attempt to rerun the restore

Might need help getting to this ourselves, since I'm wondering why the code I linked wasn't working.

KirillOsenkov commented 1 month ago

Try injecting a network error (like 401 access denied to an azdo feed with a wrong PAT), then restore again. It will play back the 401 but won't even attempt to hit the network.

nkolev92 commented 1 month ago

cc @zivkan in case anything rings a bell.

I'm wondering if there's something we're messing up on the audit front, since Audit is not supposed to fail restore by default (hence NU1900), but I don't think we're doing anything custom. it's originally a warning getting logged.

KirillOsenkov commented 1 month ago

we may have WarningsAsErrors set

Audit and check for vulnerabilities does ring a bell, I think I saw something while debugging

KirillOsenkov commented 1 month ago

And I did see the NU1900 warning being converted to error after it's been played back, so maybe the restore is considered successful because it doesn't know the warning will later be converted to an error by MSBuild? This would explain what we're seeing