bazelbuild / bazel_metrics

Apache License 2.0
7 stars 1 forks source link

Add traffic data #4

Open alexeagle opened 2 years ago

alexeagle commented 2 years ago

Not all rulesets publish a release artifact, so we're currently missing data here.

For example https://hanadigital.github.io/grev/?user=bazelbuild&repo=rules_python shows how we lost data starting at release 0.6.0 when we stopped uploading artifacts to github. (discussion: https://github.com/bazel-contrib/SIG-rules-authors/issues/11#issuecomment-1142387724)

HOWEVER there are other signals that would be just as good as downloads (none of these numbers can be trusted as absolute metrics since CI systems do much of the downloading, so we just need something comparable across rules). https://github.com/bazelbuild/rules_python/graphs/traffic for example shows how many clones and traffic to the GH website, which is a great proxy for usage.

So the proposal is to scrape the API like so

% curl -H "Accept: application/vnd.github.v3+json" -H "Authorization: token $TOKEN" https://api.github.com/repos/bazelbuild/rules_python/traffic/views
{
  "count": 10476,
  "uniques": 1405,
  "views": [
    {
      "timestamp": "2022-06-03T00:00:00Z",
      "count": 218,
      "uniques": 54
    },
...

and publish that data here. @meteorcloudy says he'd be okay with a shell script running the curl commands for now, so we could get an initial data dump from someone who has the API token required (GH only serves this data to identities that have write access).

alexeagle commented 1 year ago

Following the GitHub stable SHA fiasco last week, bazel-contrib/rules-template has changed to use release artifacts that are uploaded to GitHub. As this rolls out to rulesets, it will reduce the need for traffic data, though that's still useful to have independent of download counts. https://github.com/bazel-contrib/SIG-rules-authors/issues/53 suggests that we'll do this by creating a GH App that can read the data. The SIG will then request bazelbuild org admin to install that app so the data can be gathered by the community.