Open yishaigalatzer opened 4 years ago
Thanks for the detailed write-up @yishaigalatzer! I agree with the sentiment here.
Trying to think of a possible quick-win here...
We already have a Dimension_PackageSet
construct in the stats database. We mainly use it to apply the reverse filter: filter out non-community packages.
I'm wondering if we could use the same construct to group other packages into 'known sets' and enhance the DownloadReportRecentCommunityPopularity
sproc that way...
Downside is that this grouping would also be manually curated (though only applies to the most popular packages), just like we do for the non-community packages
that would otherwise appear in the list.
If we can build a query that generates the desired resultset, we'd need to update the report's JSON format, and the gallery's view that consumes this JSON.
Longer term, I think we should definitely take this feedback into account in a future redesign of the stats pipeline. Big question there would be: what data points are we missing (if any) to automate the grouping of these package sets? (thinking about telemetry difference between downloads and installs, direct installs versus transitive, and the like)
Issue
The package stats page https://www.nuget.org/stats/packages attempts to show the top 100 community packages. However many packages are distributed in groups, where if one of the popular packages at the root is installed it will bring the rest of the group with it. This creates a bias in the list towards popular packages that are broken down into individual components.
See this thread for more details: https://twitter.com/socketnorm/status/1203480289076375552
To Reproduce
Browse to: https://www.nuget.org/stats/packages
Expected behavior
I expect packages like xunit, serilog, swashbuckle, mongodb, and google.api to show up once per root package. So as Brad Wilson says, likely more than just one for xunit. Then I expect to be able to see a little more stats about other packages in that group and the ability to expand the entry to see all of its dependencies from the same group.
Why?
Because the current stat is both unfair, a single popular package takes a % space in the top 100 list based on how granularly it was broken down. Bringing other similarly popular package far down the list and stunting their exposure. I'm aware of my own bias as an AWS person, but note that that specific package is in the top 10 already, but packages like automapper, swashbuckle, are pushed below and so are a few others than would move above the fold.
Screenshots
Current
What it should look like
Excuse my possible mistakes in grouping some of the packages, I'm making an assumption, but the idea should come through regardless of mistakes
xunit.extensibility.corexunit.abstractionsxunit.extensibility.executionxunit.corexunit.assertxunitxunit.runner.visualstudioxunit.analyzersswashbuckle.aspnetcore.swaggerswashbuckle.aspnetcore.swaggergenswashbuckle.aspnetcore.swaggeruiswashbuckle.aspnetcoreserilog.sinks.fileHow can it be done
I have some ideas about how to curate it, but I think the NuGet team is best equipped to come up with the best strategy in this case. You guys rock!