Currently the download counter performs one GraphQL query per mod. In 184 cases, these queries duplicate other queries that have already been made, because some mods share repositories. This means we take extra time and network capacity; since GitHub's API has rate limiting, it's a good idea to minimize our risk of being throttled.
Changes
Now GraphQLQuery.add populates two dictionaries instead of one; self.repos contains a mapping from identifiers to user and repo for all mods that have been added, and self.requests contains a mapping from user and repo to the first requested identifier for repos that haven't been queried yet. If we have already retrieved a value for a repo or have already enqueued a request for it, self.requests won't be modified.
Now GraphQLQuery.get_query uses self.requests to generate its query rather than self.repos.
Now GraphQLQuery.get_result processes the data from the API in two phases: first the data received is populated into self.cache, and then the values requested in self.repos are copied from the cache to counts. This allows mods that share repos to also share counts and queries.
Now instead of making a new GraphQLQuery after each batch, we only clear the repos and requests dicts, so the cache dict can continue to serve requests for the same repos later in the pass
A similar change is made for SpaceDock by ensuring that the IDs we pass to the SpaceDock API are unique. This may mildly reduce SQL activity in SpaceDock's database, but the impact is probably negligible, and SpaceDock doesn't have rate limiting.
Since each GraphQL API call contains 40 mods, eliminating 184 queries represents 4.6 fewer network round trips.
Motivation
Currently the download counter performs one GraphQL query per mod. In 184 cases, these queries duplicate other queries that have already been made, because some mods share repositories. This means we take extra time and network capacity; since GitHub's API has rate limiting, it's a good idea to minimize our risk of being throttled.
Changes
GraphQLQuery.add
populates two dictionaries instead of one;self.repos
contains a mapping from identifiers to user and repo for all mods that have been added, andself.requests
contains a mapping from user and repo to the first requested identifier for repos that haven't been queried yet. If we have already retrieved a value for a repo or have already enqueued a request for it,self.requests
won't be modified.GraphQLQuery.get_query
usesself.requests
to generate its query rather thanself.repos
.GraphQLQuery.get_result
processes the data from the API in two phases: first the data received is populated intoself.cache
, and then the values requested inself.repos
are copied from the cache tocounts
. This allows mods that share repos to also share counts and queries.GraphQLQuery
after each batch, we only clear therepos
andrequests
dicts, so thecache
dict can continue to serve requests for the same repos later in the passSince each GraphQL API call contains 40 mods, eliminating 184 queries represents 4.6 fewer network round trips.