bazel-contrib / SIG-rules-authors

Governance and admin for the rules authors Special Interest Group
https://bazel-contrib.github.io/SIG-rules-authors/
Apache License 2.0
28 stars 12 forks source link

[catalog] Write a script that scrapes the GH traffic API #53

Open alexeagle opened 1 year ago

alexeagle commented 1 year ago

Some Googler with GH auth token could run this script on some cadence and hand the data dump to the SIG so we get relative numbers.

I emailed with the team: " Maybe obvious, but the Bazel team doesn't actually have to do any work here, if you were willing to share a GitHub access token that has needed permission across the bazelbuild org. This is what blocks an outside party from gathering numbers:

% curl -H "Accept: application/vnd.github.v3+json" -H "Authorization: token $TOKEN" https://api.github.com/repos/bazelbuild/rules_python/traffic/views
{
  "count": 10476,
  "uniques": 1405,
  "views": [
    {
      "timestamp": "2022-06-03T00:00:00Z",
      "count": 218,
      "uniques": 54
    },
...
% curl -H "Accept: application/vnd.github.v3+json" -H "Authorization: token $TOKEN" https://api.github.com/repos/bazelbuild/rules_apple/traffic/views
{
  "message": "Must have push access to repository",
  "documentation_url": "https://docs.github.com/rest/reference/repos#get-page-views"
}

@meteorcloudy indicated willingness to accept a PR on the bazelbuild/bazel_metrics repo via email:

Sounds good, maybe you can send a PR to bazel_metrics to add the script? We can the decide to either manually run it or set up a pipeline to do it.

So this ticket is to create such a script, along with a process (can just be a scheduled reminder email) for someone at Google to run the script, and it should publish the data to a place we can ingest (maybe a GH Gist or something simple like that)

aherrmann commented 1 year ago

IIUC Github Apps can access the traffic/views endpoint if they have read permissions on the repo

GET /repos/:owner/:repo/traffic/views (:read)

Perhaps a Github App could be a good way to set this up. Each repo to be listed on the catalog could install the app and that app could periodically query the traffic endpoint and send the data wherever it's needed.


As a simpler alternative I tried running the query in a GH action, but it looks like the automatic GITHUB_TOKEN is insufficient for that API endpoint.

alexeagle commented 1 year ago

That's a good idea, and we've been working with the Google team on permissions for another GitHub App (publishing new ruleset releases to BCR) so I think this can reuse a lot of work from @kormide

alexeagle commented 1 year ago

@ashi009 this might be a place to start.

ashi009 commented 1 year ago

I believe the best approach's to build an GitHub app to do this. So that we no longer need a personal access token to access the endpoint. Instead we can grant permission to this app, which will definitively make secops happy.

alexeagle commented 1 year ago

Yes and it might also let us handle "registration" - installing that app is enough to get your ruleset added to our catalog instead of needing to send a separate PR

ashi009 commented 1 year ago

Correct. Which sounds like a more user friendly approach. And by having this app installed we will be able to automate a lot of things, ie what dependencybot is doing today.

But on the other hand, creating a GitHub App is more demanding on initial designing compared to a single purpose script.

Alex Eagle @.***>于2023年2月5日 周日23:16写道:

Yes and it might also let us handle "registration" - installing that app is enough to get your ruleset added to our catalog instead of needing to send a separate PR

— Reply to this email directly, view it on GitHub https://github.com/bazel-contrib/SIG-rules-authors/issues/53#issuecomment-1418028901, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFH2B4T7LMKL3J3I6CBUXDWV6757ANCNFSM56B45P7A . You are receiving this because you were mentioned.Message ID: @.***>

-- Sent from Gmail Mobile

ashi009 commented 1 year ago

I just finished a POC Github App in go. The traffic API requires only read-only access to admin and meta to work. We can talk about this more after I send the PR.

alexeagle commented 1 year ago

I think @kormide will be a good code reviewer for that.