Open fhoffa opened 5 years ago
Great idea, I've had this come up a few times already in twitter questions.
This will take some modification to the node script as it keeps an in-memory DB of user:(current) company associations which is written out to a huge json file between program runs. Sounds like the users_companies
table would diverge from this somewhat. Need to consider if/how the schemas for the MySQL db, the json file and the users_companies
table change.
Probably necessitates a larger discussion about the CLI interface. The node commands are essentially data transfer scripts between bigquery, mysql and json, and the github.com profile scraper. Data transfers between mediums are not complete - there's db-to-json
and json-to-bigquery
, but not viceversa / anything else. Would you find those commands helpful?
Thanks for sharing this!
https://bigquery.cloud.google.com/table/public-github-adobe:github_archive_query_views.users_companies?pli=1&tab=details
Suggestions:
With 'crawled_at' you'll have to allow multiple entries per user, and adjust queries later. For example, the easiest queries would go through a view that just gives the latest company per user.