IPMITMO / github-archive-miner

:octocat: GitHub Archive Miner
https://ipmitmo.github.io/github-archive-miner/
MIT License
2 stars 2 forks source link

Try Google BigQuery to get GitHub data #24

Closed annkupriyanova closed 7 years ago

annkupriyanova commented 7 years ago

Get data from the start of GitHub to 2010

annkupriyanova commented 7 years ago

@iradche @beltasha Found an interesting web-service about GitHub. It calculates your GitHub distance from most famous people there. I tried to calculate it from Linus Torvalds. And it is still calculating (perhaps because I almost don't have connections) http://graphub.yodas.com

iradche commented 7 years ago

You could use their list. 2016-11-19 19 07 13

annkupriyanova commented 7 years ago

Yes, I saw this

annkupriyanova commented 7 years ago

GitHub Archive scheme: https://github.com/igrigorik/githubarchive.org/blob/master/bigquery/schema.js

annkupriyanova commented 7 years ago

@iradche @beltasha As far as I understood GitHub Archive organises data around GitHub activities (events). And to get users we have to read data of the events of certain type. But this is not convenient and sensible. Maybe my guess is wrong?

beltasha commented 7 years ago

https://bigquery.cloud.google.com/dataset/ghtorrent-bq:ght let's use this! :)

annkupriyanova commented 7 years ago

Everything is all right. We managed to get data from GHTorrent dataset through Google BigQuery.

iradche commented 7 years ago

Show your results and good luck