enjalot / blockbuilder-search-index

download and process d3.js blocks for further indexing and visualization
BSD 3-Clause "New" or "Revised" License
24 stars 5 forks source link

WIP - index newly discovered users and add index-new-users script #32

Closed micahstubbs closed 6 years ago

micahstubbs commented 7 years ago

screen shot 2017-08-19 at 9 23 45 pm

micahstubbs commented 7 years ago

screen shot 2017-08-19 at 9 25 08 pm

micahstubbs commented 7 years ago

ran coffee validate-users.coffee and got rate limited at user felixsch, index 5004 in users-combined.csv

screen shot 2017-08-20 at 12 21 36 am

micahstubbs commented 7 years ago

I can check my github api rate limit status with curl -XGET https://api.github.com/rate_limit

sure enough, it's out:

{
  "resources": {
    "core": {
      "limit": 60,
      "remaining": 0,
      "reset": 1503216965
    },
    "search": {
      "limit": 10,
      "remaining": 10,
      "reset": 1503214140
    },
    "graphql": {
      "limit": 0,
      "remaining": 0,
      "reset": 1503217680
    }
  },
  "rate": {
    "limit": 60,
    "remaining": 0,
    "reset": 1503216965
  }
}

https://developer.github.com/v3/rate_limit/

micahstubbs commented 7 years ago

ok, github rate-limit reset overnight. doing a new pull, with new and improved validate.coffee that let's us specify a startIndex and stops when we have reached our github rate-limit

coffee validate-users.coffee '' 5004

and again

coffee validate-users.coffee '' 10004

and once more

coffee validate-users.coffee '' 15008

micahstubbs commented 7 years ago

using these regexes to find spaces

\s(?=\d,) \s(?=\d\d,) \s(?=\d\d\d,) \s(?=\d\d\d\d,)

so we can and replace them with commas

using this and another find and replace operation in sublime text to convert

validate-coffee-partial-results.csv to gist-counts-by-user.csv

this is a hack that I'll replace with a proper script soon 😅

I keep x-ratelimit-remaining around as a column since it might be interesting to look at later, even though it's an artifact of our github API calling user-validation process and not directly related to the source data.

micahstubbs commented 7 years ago

ok, progress! we have have identified 7316 users

screen shot 2017-08-20 at 3 28 16 pm
micahstubbs commented 7 years ago

now we'll run

sh index-new-users.sh

to generate updated metadata that contains gists from these newly discovered users

micahstubbs commented 7 years ago

ok, that didn't work, because it depends on new-usables.csv, and we haven't properly updated that yet.

the next thing to do is:

micahstubbs commented 7 years ago
screen shot 2017-08-20 at 7 01 27 pm
done
skipped 0 missing files
wrote 10446 API blocks
wrote 11523 Color blocks
wrote 96749 Files blocks
wrote 24900 total blocks
➜  blockbuilder-search-index git:(index-new-users) ✗
micahstubbs commented 7 years ago

ok, so before we knew about 24122 blocks. now after parsing through all the repo names on github for d3, we know about 24900 blocks. It looks like this research project netted us 778 new blocks

micahstubbs commented 6 years ago

doing multiple things at once, will move into smaller PRs 😅