cncf / devstats.archive

📈CNCF-created tool for analyzing and graphing developer contributions
https://devstats.cncf.io/
Apache License 2.0
444 stars 147 forks source link

[feature request] create a dashboard for clone data #288

Open caniszczyk opened 3 years ago

caniszczyk commented 3 years ago

GitHub has this info available via their builtin dashboards, e.g., https://github.com/cncf/devstats/graphs/traffic

I don't know what the API looks like to pull this but since we have data for stars and forks, maybe we add that to the dashboard: https://kubevirt.devstats.cncf.io/d/3/stars-and-forks-by-repository?orgId=1

Maybe we call it 'stars-forks-and-clones' ;)? or a separate one for just clones

lukaszgryglicki commented 3 years ago

I'll research this on Friday, is this OK? We don't use GitHub API in DevStats - we use GitHub archives data.

caniszczyk commented 3 years ago

works for me, no rush

On Tue, Mar 9, 2021 at 12:33 AM Łukasz Gryglicki notifications@github.com wrote:

I'll research this on Friday, is this OK? We don't use GitHub API in DevStats - we use GitHub archives data.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cncf/devstats/issues/288#issuecomment-793454111, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPSIOP7FWTEROU7F5UXBTTCW6LBANCNFSM4YZXZC7Q .

-- Cheers,

Chris Aniszczyk http://aniszczyk.org

lukaszgryglicki commented 3 years ago

Doing some research, but I'm quite sure we don't have that data in GitHub archives (which is DevStats' data source), created this issue/question/feature request in the meantime to confirm (now I'm digging several hundreds of megabytes of GHA JSONs to see if there were any data format updates to includ ethis info).

lukaszgryglicki commented 3 years ago

I've checked few huge JSONs with a few grep-like approaches (they're over 2.5G in size when converted from ndjson to a correct JSONs) I don't see any data that makes this feature request possible, will also wait for any feedback on my feature request/issue from the previous post.

All I can consider here is to do a hybrid approach - make DevStats also call GitHub APi to get this data - but even if I do so, I can only get last 14 days clones (see API docs) - so I won't be able to get any historical data.

Should I proceed with that hybrid approach @caniszczyk ? If so - then it will take a rather long time - it's somethign. totally new to be implemented.

Will hold until I get feedback - what do do.

lukaszgryglicki commented 3 years ago

So @caniszczyk GHA maintainer confirmed that GHA doesn't have that data, so the only possibility is the hybrid approach described here - please let me know if we want to proceed that way? (but I think this is not a really good approach - we cannot get the historical data and we're limited to 14 days days + we need to process GitHubh APi and maintain tokens for few thousands of GitHub repos - this will be slow and actually against a typincal DevStats approach).

caniszczyk commented 3 years ago

let's hold off on this feature for now, leave the issue open though and put it on the backlog

On Thu, Mar 11, 2021 at 1:31 AM Łukasz Gryglicki @.***> wrote:

So @caniszczyk https://github.com/caniszczyk GHA maintainer https://github.com/igrigorik confirmed https://github.com/igrigorik/gharchive.org/issues/248#issuecomment-796505931 that GHA doesn't have that data, so the only possibility is the hybrid approach described here https://github.com/cncf/devstats/issues/288#issuecomment-795282524 - please let me know if we want to proceed that way? (but I think this is not a really good approach - we cannot get the historical data and we're limited to 14 days days + we need to process GitHubh APi and maintain tokens for few thousands of GitHub repos - this will be slow and actually against a typincal DevStats approach).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cncf/devstats/issues/288#issuecomment-796529125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPSINAQPQ5CLNYAAV5ZWTTDBWWTANCNFSM4YZXZC7Q .

-- Cheers,

Chris Aniszczyk http://aniszczyk.org

lukaszgryglicki commented 3 years ago

OK.