Open caniszczyk opened 3 years ago
I'll research this on Friday, is this OK? We don't use GitHub API in DevStats - we use GitHub archives data.
works for me, no rush
On Tue, Mar 9, 2021 at 12:33 AM Łukasz Gryglicki notifications@github.com wrote:
I'll research this on Friday, is this OK? We don't use GitHub API in DevStats - we use GitHub archives data.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cncf/devstats/issues/288#issuecomment-793454111, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPSIOP7FWTEROU7F5UXBTTCW6LBANCNFSM4YZXZC7Q .
-- Cheers,
Chris Aniszczyk http://aniszczyk.org
Doing some research, but I'm quite sure we don't have that data in GitHub archives (which is DevStats' data source), created this issue/question/feature request in the meantime to confirm (now I'm digging several hundreds of megabytes of GHA JSONs to see if there were any data format updates to includ ethis info).
I've checked few huge JSONs with a few grep-like approaches (they're over 2.5G in size when converted from ndjson to a correct JSONs) I don't see any data that makes this feature request possible, will also wait for any feedback on my feature request/issue from the previous post.
All I can consider here is to do a hybrid approach - make DevStats also call GitHub APi to get this data - but even if I do so, I can only get last 14 days clones (see API docs) - so I won't be able to get any historical data.
Should I proceed with that hybrid approach @caniszczyk ? If so - then it will take a rather long time - it's somethign. totally new to be implemented.
Will hold until I get feedback - what do do.
So @caniszczyk GHA maintainer confirmed that GHA doesn't have that data, so the only possibility is the hybrid approach described here - please let me know if we want to proceed that way? (but I think this is not a really good approach - we cannot get the historical data and we're limited to 14 days days + we need to process GitHubh APi and maintain tokens for few thousands of GitHub repos - this will be slow and actually against a typincal DevStats approach).
let's hold off on this feature for now, leave the issue open though and put it on the backlog
On Thu, Mar 11, 2021 at 1:31 AM Łukasz Gryglicki @.***> wrote:
So @caniszczyk https://github.com/caniszczyk GHA maintainer https://github.com/igrigorik confirmed https://github.com/igrigorik/gharchive.org/issues/248#issuecomment-796505931 that GHA doesn't have that data, so the only possibility is the hybrid approach described here https://github.com/cncf/devstats/issues/288#issuecomment-795282524 - please let me know if we want to proceed that way? (but I think this is not a really good approach - we cannot get the historical data and we're limited to 14 days days + we need to process GitHubh APi and maintain tokens for few thousands of GitHub repos - this will be slow and actually against a typincal DevStats approach).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cncf/devstats/issues/288#issuecomment-796529125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPSINAQPQ5CLNYAAV5ZWTTDBWWTANCNFSM4YZXZC7Q .
-- Cheers,
Chris Aniszczyk http://aniszczyk.org
OK.
GitHub has this info available via their builtin dashboards, e.g., https://github.com/cncf/devstats/graphs/traffic
I don't know what the API looks like to pull this but since we have data for stars and forks, maybe we add that to the dashboard: https://kubevirt.devstats.cncf.io/d/3/stars-and-forks-by-repository?orgId=1
Maybe we call it 'stars-forks-and-clones' ;)? or a separate one for just clones