As part of https://github.com/NuGet/NuGetGallery/issues/8873, I was interested in implementing proper no-oping for deprecation. This is to ensure that redundant API calls do not easily cause many unnecessary V3 catalog leaves and therefore load on the DB.
I did a data investigation with NuGet.Insights to see how often duplicate catalog leaves occurred. It turns out... a lot!
Of catalog leaves with deprecation information, over 15% are duplicate, i.e. their mutable metadata (listed, vulnerabilities, and deprecation) are the same as the previous leaf for that same package.
let SameAsPreviousLeaves = NiCatalogLeafItems
| where isnotempty(Deprecation)
| order by Identity asc, CommitTimestamp asc
| project LowerId, Identity, CommitTimestamp, IsListed, Deprecation = tostring(Deprecation), Vulnerabilities = tostring(Vulnerabilities)
| extend SameAsPrevious = Identity == prev(Identity) and Deprecation == prev(Deprecation) and IsListed == prev(IsListed) and Vulnerabilities == prev(Vulnerabilities)
| project-reorder Identity, CommitTimestamp, SameAsPrevious, Deprecation, LowerId;
let LeafCount = toscalar(NiCatalogLeafItems | count);
SameAsPreviousLeaves
| summarize LeavesWithDeprecation = count(), DuplicateLeaves = countif(SameAsPrevious)
| extend TotalLeaves = LeafCount
| extend ['Duplicate % of total'] = round(100.0 * DuplicateLeaves / TotalLeaves, 2)
| extend ['Duplicate % of deprecated'] = round(100.0 * DuplicateLeaves / LeavesWithDeprecation, 2)
LeavesWithDeprecation
DuplicateLeaves
TotalLeaves
Duplicate % of total
Duplicate % of deprecated
684887
103619
12725635
0.81
15.13
I believe this is made even worse for users that have access to the deprecation API private preview. It's easy to re-request deprecation of a growing set of versions with the API and cause a growing wave of duplicate deprecations. azure-sdk is at the top of the deprecation list because it is enabled for the deprecation API.
let SameAsPreviousLeaves = NiCatalogLeafItems
| where isnotempty(Deprecation)
| order by Identity asc, CommitTimestamp asc
| project LowerId, Identity, CommitTimestamp, IsListed, Deprecation = tostring(Deprecation), Vulnerabilities = tostring(Vulnerabilities)
| extend SameAsPrevious = Identity == prev(Identity) and Deprecation == prev(Deprecation) and IsListed == prev(IsListed) and Vulnerabilities == prev(Vulnerabilities)
| project-reorder Identity, CommitTimestamp, SameAsPrevious, Deprecation, LowerId;
SameAsPreviousLeaves
| summarize DeprecationCount = count(), DuplicateCount = countif(SameAsPrevious) by Identity, LowerId
| join kind=inner NiPackageOwners on LowerId
| project LowerId, Identity, Owners, DeprecationCount, DuplicateCount
| mv-expand Owner = Owners to typeof(string)
| summarize DeprecationCount = sum(DeprecationCount), DuplicateCount = sum(DuplicateCount) by Owner
| order by DuplicateCount desc
| extend ['Duplicate %'] = round(100.0 * DuplicateCount / DeprecationCount, 2)
| take 10
Impact
I'm unable to use NuGet.org
Describe the bug
As part of https://github.com/NuGet/NuGetGallery/issues/8873, I was interested in implementing proper no-oping for deprecation. This is to ensure that redundant API calls do not easily cause many unnecessary V3 catalog leaves and therefore load on the DB.
I did a data investigation with NuGet.Insights to see how often duplicate catalog leaves occurred. It turns out... a lot!
Of catalog leaves with deprecation information, over 15% are duplicate, i.e. their mutable metadata (listed, vulnerabilities, and deprecation) are the same as the previous leaf for that same package.
I believe this is made even worse for users that have access to the deprecation API private preview. It's easy to re-request deprecation of a growing set of versions with the API and cause a growing wave of duplicate deprecations.
azure-sdk
is at the top of the deprecation list because it is enabled for the deprecation API.We should implement proper no-oping in
PackageDeprecationService
. Currently the code updates theLastEdited
field on ALL packages in the deprecation batch if there is ANY change (rather than no-oping on a package-by-package basis). https://github.com/NuGet/NuGetGallery/blob/f58a360a3e081f633eb7b0791b0de625d2bb9399/src/NuGetGallery/Services/PackageDeprecationService.cs#L103-L110Repro Steps
Expected Behavior
Version Z should only have it's
LastEdited
value updated.Actual: X, Y, and Z have their
LastEdited
value updated.Screenshots
No response
Additional Context and logs
No response