NuGet / NuGetGallery

NuGet Gallery is a package repository that powers https://www.nuget.org. Use this repo for reporting NuGet.org issues.
https://www.nuget.org/
Apache License 2.0
1.52k stars 644 forks source link

[NuGet.org Bug]: download counts are inconsistent between Gallery and Search service #9791

Open drewgillies opened 5 months ago

drewgillies commented 5 months ago

Impact

Other

Describe the bug

Statistics on the search page don't match statistics on the package details page.

Repro Steps

Search for Fabulous Scheduler on NuGet.org and view the results: https://www.nuget.org/packages?q=fabulous+scheduler Click on the FabulousScheduler link on the results page and view the stats on the package details page: https://www.nuget.org/packages/FabulousScheduler

They don't match.

Expected Behavior

Both screen show the same download count.

Screenshots

Search results: image

Package details: image

Additional Context and logs

No response

swharden commented 4 months ago

Possibly related, the primary query API has been serving stale download counts for a few weeks, but the secondary endpoint serves higher download counts.

e.g., Compare totalDownloads of ussc vs usnc. According to my records, the last time the primary endpoint updated download counts was Jan 19, 2024 (22 days ago), but the secondary one seems to be working as expected.

A side effect of this issue is that NuGet trends graphs have leveled-out for the last few weeks. E.g., https://nugettrends.com/packages?months=6&ids=NUnit

image

I hope this information is helpful. Thanks NuGet team for all you do! 🚀

joelverhagen commented 2 months ago

If you independently check download counts on the search API vs. the gallery, you will often see a different number. This is by design because there is no shared, live cache that gallery and search services depend on (and we don't want the headache of the SPOF).

But the problem related to rendering in the gallery, for example search results show in gallery vs. the package details page could be resolved by replacing the search API download count with what the gallery knows via its own cache. This would at least make gallery self-consistent.

A related issue, and more of a bug is https://github.com/NuGet/NuGetGallery/issues/9928 which concerns gallery self-consistency.

jodydonetti commented 2 months ago

But the problem related to rendering in the gallery, for example search results show in gallery vs. the package details page could be resolved by replacing the search API download count with what the gallery knows via its own cache. This would at least make gallery self-consistent.

True but, and I haven't looked at the code yet so I may be wrong about this, I smell at least a potential "SELECT N+1" problem here...

Will look into that as soon as possible.

joelverhagen commented 2 months ago

True but, and I haven't looked at the code yet so I may be wrong about this, I smell at least a potential "SELECT N+1" problem here...

Yes, depending on the implementation of the cache you're totally right. Today the cache is a giant in-memory dictionary where point reads are "free". So doing 20-30 download count reads as a fix-up when rendering the search page could work. But a more traditional read-through would have that problem.

jodydonetti commented 2 months ago

Ah, good call! I'm thinking about something right now, will update.