Open FarazzShaikh opened 3 years ago
Just a side note - This is not a replacement to the official API. Its simply a buffer between the API and the User so you dont run into the rate limit.
Generally, when you attempt real time GitHub stats using the Official API, you need to make more than 1 request to get all the information you would need to make appealing UI. For example, to display the latest repository, you need to first query the search API then get a URL from the response to then query and get the languages used.
You'd also typically want to query information about more than 1 repo, so you can see how quickly the rate limit will be reached especially if you refresh a couple times or during development of your site. Once it is reached a 401 will crash your app or it will make your UI look ugly unless you provide fallback data.
edit: Yeah, you can increase your limit by providing a key but you can’t really hide your key on static sites.
I guess for initial implementation we just want to extract raw data as a module. As a part of tech club we would be very interested to link this module to our static website generator.
I don't think we need the scraper to run every few hours. This functionality is only need if we are complex stuff such as tracking commits. Since we are also planning to host it we should make sure the only thing running on our side is the web scraper and custom web site generator module.
Yep you can extend this to whatever you need. We can make the interval and everything else fully configurable with something like environment variables. Since its self hosted, the Tech Club can run an instance of this and give it whatever config that suits its needs.
Agreed we can start this project soon waiting for opinions from @benjaminjacobreji
In fact I think it would be very cool if the Tech Club site shows GitHub stats for all its members (with consent, duh). It would incentivise open source development all the while providing some publicity to their projects.
Agreed
Agreed we can start this project soon waiting for opinions from @benjaminjacobreji
I think this is a great idea! We should do this
Sounds good. @Akilan1999 invite me to the organization, the 2FA thing kicked me out last year automatically.
I can set up the repo and the to do lists, this should be very simple
Sent !
Cool will create the repo and everything today evening. I will keep this issue open till the project is complete.
First and Last Name
Faraz Shaikh
Email
farazzshaikh@gmail.com frzskh@hw.ac.uk
Company/Organization (Ex: Heriot-Watt)
Heriot-Watt
Job Title (Ex: Student)
Student
Project Title
GitHub profile scraper (will think of something more creative later)
Briefly describe the project
See bellow
What kind of machines and how many do you expect to use?
None
What operating system and networking are you planning to use?
None?
Any other relevant details we should know about?
See bellow
Additional context
GitHub profile scraper
A self-hosted GitHub profile scrapper. This can be used as a middle-man between your site and GitHub's API.
The problem
The official GitHub API rate limits you to about 60 requests an hour for
core
and 20 forsearch
. Furthermore, some data simply requires some API gymnastics to retrieve.Yes the GraphQL API does exist and is better but do you really want to set up GraphQL for static sites? I don't. Besides, its a cool little side project to spend a week on.
The Solution
This will use Firebase Cloud Functions to run a function every couple hours (or whatever interval) and scrape the contents of a GitHub profile via ether good ol' Web Scraping or the GitHub API itself. After that, it will store all the data as one or two documents in Firebase Realtime Database.
The user can then run another Cloud Function to fetch the data from the database. Something like this:
The Use
You can use this to include "real time" GitHub stats in your whatever. Personally, I will use this to do the same in my portfolio site.
The data that would be useful is things like