Open jamietanna opened 8 months ago
A bulk lookup endpoint is definitely a feature I'd like to implement soon, I'm on the fence about a graphql api, mostly because it's potentially a lot more maintaince and load on an already stretched database.
There aren't too many performance gains to be had removing certain fields from the response they all come from the same table, any kind of joins or loops that a user could do in graphql could easily cause major performance issues if they miss an index.
Related to https://github.com/ecosyste-ms/packages/issues/650 I'm looking at improving the performance of lookups against Ecosystems.
For two use-cases right now I'm calling Ecosystems' Packages API:
lookupPackage
(via pURL), then calling out to Security Scorecards' APIlookupPackage
(via pURL), then callinggetRegistryPackageVersion
As noted in https://gitlab.com/tanna.dev/dependency-management-data/-/issues/459 there's a fair bit of a performance hit when running this.
I'm not currently performing any caching of anything, or have done anything my side other than send more concurrent requests - so I know there's definitely some stuff I can be doing to improve things!
But was also wondering if there was any appetite to request a subset of data (i.e. don't try and fetch repo metadata if it's going to be ignored) or allow sending a "bulk" lookup request so we can get back multiple packages in a single request.
I've recently got into the GraphQL hype for some data pieces, and feel that it could help simplify the amount of data that's required to fetch, especially if the consumer doesn't need it all.
I envision the start of the GraphQL API being to return exactly the data that we can do right now via the
lookupPackage
API, but allows us to unselect certain fields, allowing i.e. not lookup up advisories, registry, repo, etc unless requested.Also interested to hear your thoughts, especially as it could be very much "I'm holding it wrong"
Upvote & Fund