dotnet / nuget-trends

Check out NuGet packages adoption and what's trending on NuGet.
https://nugettrends.com
MIT License
146 stars 25 forks source link

chart by package version #132

Open MarkPflug opened 3 years ago

MarkPflug commented 3 years ago

It would be nice if the graph could be narrowed down to a specific version of the package. Maybe even a stacked-bar chart with version. If this was paired with an x-axis overlay of version releases (via a colored vertical line) it would help visualize how quickly new versions are adopted and how much old versions are still in use.

bruno-garcia commented 3 years ago

That would be nice. We would need to collect download numbers per version though, which we don't.

MarkPflug commented 3 years ago

Oh, I see. I was assuming that since nuget.org shows the per-version download numbers that you'd have access to that data as well.

bruno-garcia commented 3 years ago

@MarkPflug it might be available by one of nuget.org's API but right now we hit a single package (not a version of it) per day once, and get the total number (across all versions). So we need to change that. That said unless we can fetch the whole thing with a single hit to their API, it likely will need some redesign on the job, it takes 1 or 2 hours to go through the 220000+ packages right now.

Probably a good chance to simplify the backend.

loic-sharma commented 3 years ago

You should be able to get the downloads by version using your current approach. The search response contains a breakdown of downloads by version. For example: https://azuresearch-usnc.nuget.org/query?q=packageid:Newtonsoft.Json&take=1

{
    ...
    "totalHits": 1,
    "data": [
        {
            ...
            "id": "Newtonsoft.Json",
            "version": "12.0.3",
            "totalDownloads": 824781418,
            ...
            "versions": [
                {
                    "version": "3.5.8",
                    "downloads": 586170,
                    "@id": "https://api.nuget.org/v3/registration5-semver1/newtonsoft.json/3.5.8.json"
                },
                ...
                {
                    "version": "12.0.3",
                    "downloads": 83014646,
                    "@id": "https://api.nuget.org/v3/registration5-semver1/newtonsoft.json/12.0.3.json"
                }
            ]
        }
    ]
}
loic-sharma commented 3 years ago

You should be able to get the downloads by version by calling packageMetadata.GetVersionsAsync() here:

https://github.com/dotnet/nuget-trends/blob/d114dc6bb76d729411b6cce45e135ed809257aed/src/NuGetTrends.Scheduler/DailyDownloadWorker.cs#L189-L194

FYI, the method is async but it doesn't do anything expensive like additional web requests when using the V3 protocol (see this).

P.S. Nice CSV library @MarkPflug :)

bruno-garcia commented 3 years ago

Thanks for the pointers @loic-sharma. The only question left is: Do we want to do that in the current architecture? I wonder how much more data per day we'll be dumping into pgsql. @clairernovotny mentioned the foundation can host the site on Azure so maybe we can use blob storage to dump these numbers given they are immutable, or some other strategy. We can probably get rid of rabbitmq too which is used only to queue the batch of ids to hit nuget.org. Some other way to have reentrancy would be needed so we can restart the job not having to start from the beginning.