Open halcyondude opened 2 years ago
Is this data available in JSON?
Caveat: I started peeling back the layers of devstats shortly (minutes) ago, still assessing.
High Level: https://github.com/cncf/devstats#architecture Detailed, Specific overview: https://github.com/cncf/devstats-helm#architecture
This is the best overview that I've found thus far detailing how it works, and inputs to the design.
https://github.com/cncf/devstats/blob/master/ARCHITECTURE.md
In a nutshell, they are parsing the GH archives to avoid pulling the ocean thru a straw (rate limiting) to access the full event stream, as well as keeping (as part of devstats) local git clones (in individual PV's) for file info. This is somewhat similar to gitbase in it's design.
I think bulk loading of git commits / history into a graph will be more readily accomplished w/ gitbase's mysql endpoint as an etl source. However devstats does an amazing amount of aggregation and summarization already today.
The data are available in a few different ways/layers. There's the raw data from https://www.gharchive.org, a REST API, database dumps, and grafana dashboards, and I'm not yet sure what else :)
api docs | https://github.com/cncf/devstatscode/blob/master/API.md |
endpoint | https://devstats.cncf.io/api/v1 |
impl (in Go) | https://github.com/cncf/devstatscode/blob/master/cmd/api/api.go |
it's REST w/ a markdown doc, so something like https://github.com/ibm/openapi-to-graphql isn't possible.
https://devstats.cncf.io/backups
Dashboards use the PG data source and have queries...but is brittle, would require running devstats or access to underlying DB, or standing up a new DB w/ this data
https://all.devstats.cncf.io/d/53/projects-health-table?orgId=1