countering-bean-counting / bonnyci_shuffleboard

Truffle-shuffling data for the ci-plunder project
0 stars 0 forks source link

Make requests to github to get additional data given a list of repos #64

Closed missaugustina closed 7 years ago

missaugustina commented 7 years ago

Limit this to just one or two projects since the rate limit is 60 and handling for rate limit is another task.

UPDATE:

Adding a checklist

missaugustina commented 7 years ago

This use case may have changed now that I was able to get a pull of GHTorrent data. It might be fine to just do it a repo at a time and do a minimal rate limit check. For instance, just take in a list of url's with args from a file, calculate how many api calls that's going to be, if over 60 then group them and say what time they can be run.

missaugustina commented 7 years ago

Looks like GHTorrent has a lot of data with stuff missing for one or two repos. Additionally, the data is a bit outdated so it might be worthwhile to get partial data for some repos.

missaugustina commented 7 years ago

Fixed rate limit issue by using OATH2

missaugustina commented 7 years ago

Only issue now is combining the CSV files. I'm keeping them individualized for auditing purposes. When combining the columns aren't matching up due to GH API inconsistencies.

missaugustina commented 7 years ago

Moving to stale for now since I haven't touched this in a couple of weeks, but I do plan to work on it today or tomorrow. I think I'm just going to read in all the header rows and just compare the contents and then add any additional rows that show up. The contents will just have to be adjusted but there's code in the CSV Writer classes that already does that from when I was processing events data before.

The code is pretty quick and dirty but if anyone wants to take a stab at it feel free!

missaugustina commented 7 years ago

Removed help wanted since I've asked for help a few times and had no takers. I'm working on this now cuz it's blocking my ability to analyze the repo samples.