eellak / commit-timeline

Create a timeline of commits by Greek committers to public repositories
Other
2 stars 2 forks source link

Scripts ignore GitHub API's pagination/rate limits, contain half of the work #1

Open paravoid opened 9 years ago

paravoid commented 9 years ago

The scripts are very naive: trying to consume a RESTful API with bash/curl isn't exactly a sound idea IMHO :)

For starters, greek-commiters.sh curls for followers/location, but ignores the fact that the results are rate limited and paginated. So, for example, followers>5 + Athens alone has 187 results (and "Athens, Greece" 144) , but the first API call (without a ?page= argument) returns 30 and the total users.txt has only 90 people. You're essentially missing half of your target set…

Moreover, just search for "Athens" isn't enough -- there's also an Athens in the state of Georgia in the United States; your result set includes users that have e.g. "Athens, GA" in their profile (for example, user "yegle"). The complete result set will have other permuations of that ("Athens, Georgia") as well as inconclusive results ("Athens"). Maybe you should only search for "Athens, Greece".

In general, I'd suggest something slightly more sophisticated, in a higher level language.

tgkarounos commented 9 years ago

You are welcome to improve this script, if you have the time.

On Sat, Dec 13, 2014 at 7:36 PM, Faidon Liambotis notifications@github.com wrote:

The scripts are very naive: trying to consume a RESTful API with bash/curl isn't exactly a sound idea IMHO :)

For starters, greek-commiters.sh curls for followers/location, but ignores the fact that the results are rate limited and paginated. So, for example, followers>5 + Athens alone has 187 results (and "Athens, Greece" 144) , but the first API call (without a ?page= argument) returns 30 and the total users.txt has only 90 people. You're essentially missing half of your target set…

Moreover, just search for "Athens" isn't enough -- there's also an Athens in the state of Georgia in the United States; your result set includes users that have e.g. "Athens, GA" in their profile (for example, user "yegle"). The complete result set will have other permuations of that ("Athens, Georgia") as well as inconclusive results ("Athens"). Maybe you should only search for "Athens, Greece".

In general, I'd suggest something slightly more sophisticated, in a higher level language.

— Reply to this email directly or view it on GitHub https://github.com/eellak/commit-timeline/issues/1.

http://www.eellak.gr/, http://mycontent.ellak.gr/, http://www.creativecommons.gr/, http://mathe.ellak.gr/, http://ma.ellak.gr/

dspinellis commented 9 years ago

Regarding rate limiting the update scripts pause sufficiently between requests. (Also an API key can be included.) The greek-committers.sh script is notional. It has already been executed and the results are included in the repo. You're probably right about the pagination and Athens, GA though. We'd welcome a pull request with corrected data/script.