Closed simonw closed 4 years ago
Unfortunately it's not available through any GitHub API - I managed to figure out how to get dependencies, but I need dependents. https://github.com/simonw/til/blob/master/github/dependencies-graphql-api.md
It looks like the only option is to scrape them. I'll do that and then replace with an API as soon as one becomes available.
Proposed command:
github-to-sqlite scrape-dependents github.db simonw/datasette
I'll pull full details of the scraped repos from the regular API. I'll also record when they were "first seen" by the command.
I think this is the neatest scraping pattern:
[a["href"].lstrip("/") for a in soup.select("a[data-hovercard-type=repository]")]
And to find the "Next" pagination link:
soup.select(".paginate-container")[0].find("a", text="Next")
I really, really want to start grabbing this data: https://github.com/simonw/datasette/network/dependents