NuGet / NuGetGallery

NuGet Gallery is a package repository that powers https://www.nuget.org. Use this repo for reporting NuGet.org issues.
https://www.nuget.org/
Apache License 2.0
1.54k stars 643 forks source link

[NuGet.org Bug]: many catalog package details leaves are redundant (duplicate) because of improper deprecation no-oping #9921

Closed joelverhagen closed 1 month ago

joelverhagen commented 5 months ago

Impact

I'm unable to use NuGet.org

Describe the bug

As part of https://github.com/NuGet/NuGetGallery/issues/8873, I was interested in implementing proper no-oping for deprecation. This is to ensure that redundant API calls do not easily cause many unnecessary V3 catalog leaves and therefore load on the DB.

I did a data investigation with NuGet.Insights to see how often duplicate catalog leaves occurred. It turns out... a lot!

Of catalog leaves with deprecation information, over 15% are duplicate, i.e. their mutable metadata (listed, vulnerabilities, and deprecation) are the same as the previous leaf for that same package.

let SameAsPreviousLeaves = NiCatalogLeafItems
| where isnotempty(Deprecation)
| order by Identity asc, CommitTimestamp asc
| project LowerId, Identity, CommitTimestamp, IsListed, Deprecation = tostring(Deprecation), Vulnerabilities = tostring(Vulnerabilities)
| extend SameAsPrevious = Identity == prev(Identity) and Deprecation == prev(Deprecation) and IsListed == prev(IsListed) and Vulnerabilities == prev(Vulnerabilities)
| project-reorder Identity, CommitTimestamp, SameAsPrevious, Deprecation, LowerId;
let LeafCount = toscalar(NiCatalogLeafItems | count);
SameAsPreviousLeaves
| summarize LeavesWithDeprecation = count(), DuplicateLeaves = countif(SameAsPrevious)
| extend TotalLeaves = LeafCount
| extend ['Duplicate % of total'] = round(100.0 * DuplicateLeaves / TotalLeaves, 2)
| extend ['Duplicate % of deprecated'] = round(100.0 * DuplicateLeaves / LeavesWithDeprecation, 2)
LeavesWithDeprecation DuplicateLeaves TotalLeaves Duplicate % of total Duplicate % of deprecated
684887 103619 12725635 0.81 15.13

I believe this is made even worse for users that have access to the deprecation API private preview. It's easy to re-request deprecation of a growing set of versions with the API and cause a growing wave of duplicate deprecations. azure-sdk is at the top of the deprecation list because it is enabled for the deprecation API.

let SameAsPreviousLeaves = NiCatalogLeafItems
| where isnotempty(Deprecation)
| order by Identity asc, CommitTimestamp asc
| project LowerId, Identity, CommitTimestamp, IsListed, Deprecation = tostring(Deprecation), Vulnerabilities = tostring(Vulnerabilities)
| extend SameAsPrevious = Identity == prev(Identity) and Deprecation == prev(Deprecation) and IsListed == prev(IsListed) and Vulnerabilities == prev(Vulnerabilities)
| project-reorder Identity, CommitTimestamp, SameAsPrevious, Deprecation, LowerId;
SameAsPreviousLeaves
| summarize DeprecationCount = count(), DuplicateCount = countif(SameAsPrevious) by Identity, LowerId
| join kind=inner NiPackageOwners on LowerId
| project LowerId, Identity, Owners, DeprecationCount, DuplicateCount
| mv-expand Owner = Owners to typeof(string)
| summarize DeprecationCount = sum(DeprecationCount), DuplicateCount = sum(DuplicateCount) by Owner
| order by DuplicateCount desc
| extend ['Duplicate %'] = round(100.0 * DuplicateCount / DeprecationCount, 2)
| take 10
Owner DeprecationCount DuplicateCount Duplicate %
Microsoft 112632 55381 49.17
azure-sdk 74116 52725 71.14
monk.soul 73453 21262 28.95
chinadotnet 61946 17637 28.47
dotnetchina 61946 17637 28.47
Thinkka 3860 3497 90.6
nugetservicebus 4296 3030 70.53
AppInsightsSdk 4168 2770 66.46
RabbitFoot 15833 1990 12.57
furion.net 6501 1847 28.41

We should implement proper no-oping in PackageDeprecationService. Currently the code updates the LastEdited field on ALL packages in the deprecation batch if there is ANY change (rather than no-oping on a package-by-package basis). https://github.com/NuGet/NuGetGallery/blob/f58a360a3e081f633eb7b0791b0de625d2bb9399/src/NuGetGallery/Services/PackageDeprecationService.cs#L103-L110

Repro Steps

  1. Deprecate version X and Y of a package via the UI.
  2. Deprecate version X, Y, and Z of a package via the UI, carefully selecting the same deprecation settings as the previous step.

Expected Behavior

Version Z should only have it's LastEdited value updated.

Actual: X, Y, and Z have their LastEdited value updated.

Screenshots

No response

Additional Context and logs

No response

joelverhagen commented 1 month ago

This is fixed in PROD.