commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
122 stars 24 forks source link

Add a Github document source #55

Open sylvinus opened 8 years ago

sylvinus commented 8 years ago

They mention opening their data: https://github.com/blog/2201-making-open-source-data-more-available

I'm not sure if the dumps are publicly accessible outside of BigQuery? If not, is using the API the only solution?

chaconnewu commented 8 years ago

GitHub data is available here: https://www.githubarchive.org/ . I believe it is the source for the GitHub data in BigQuery as well.

sylvinus commented 8 years ago

@chaconnewu it seems that GitHub Archive has the events data but not the repository or file data, which is unfortunately what we're mostly interested in at this point :-(