Closed laurentS closed 2 years ago
@laurentS - I have no objection to adding the dependencies
but what do you think about creating as a dedicated stream, as a child stream of repository?
Just to clarify, I'm talking about dependents
, ie: the packages/repos that depend on the currently fetched one. So going "up" the dependency tree, as opposed to "down" with dependencies
(for which there seems to be API endpoints, at least in graphQL).
Happy to do this as a child stream if it makes more sense. As far as I can see, it would be a single request/record per repo, with 2 data fields to start with (but potentially more in the future).
@laurentS - Thanks for clarifying the dependents
vs dependencies
. I think the bigger clarification though is whether you want just the count of dependents
or if you'll also (now or in the future) want the listing. I first thought we wanted the list of them, which is why I suggested the child stream. If you do think you'll want the list of repos that depend on the active one, then I think this would be correctly modeled as a child stream of repository
since it neatly generates a one-to-many mapping of child records (even though you are correct to say they are technically 'upstream').
If you only want the count of dependents, I could see this being a property of repositories
as you suggest. Two considerations come to mind if adding as a property:
dependents
will bump the incremental key for repositories
. I don't know how important this is but want to call it out as something to consider.There are all great points!
repositories
. I've not considered getting the full list of them, as it would starting becoming fairly painful (and looking more like web scraping than API consumption). Also, we don't really use this info at the moment ;)@laurentS I'd love to revive this now that we have the GraphQl endpoints. I think we should aim to grab both dependents and dependencies
For the dependencies, you can use this - https://docs.github.com/en/graphql/overview/schema-previews#access-to-a-repositories-dependency-graph-preview, see how it can be used in https://github.com/simonw/til/blob/master/github/dependencies-graphql-api.md
Or by scraping, see https://github.com/dogsheep/github-to-sqlite/pull/70 and the assosciated functions
We use the
dependents
count for a repository, which is currently fetched by grabbing the html page for the project (eg. https://github.com/facebook/react/network/dependents) and parsing the HTML. As I write this ticket, the link above returns7,878,702 Repositories
(and likewise for packages) and we grab these numbers. Unfortunately, this info does not seem to exist anywhere in either the REST or graphQL APIs.@aaronsteers Would you have any objection to me adding a request for that page to the
repositories
stream resulting in an extra field? Possibly behind some config option as it is fairly download heavy (the page above weighs 187kB). Maybe in thepost_process
method? Ideally, the data will eventually be available in one of the APIs, and this can then be dropped.