Consider exposing a GraphQL API / bulk lookup API?

Related to https://github.com/ecosyste-ms/packages/issues/650 I'm looking at improving the performance of lookups against Ecosystems.

For two use-cases right now I'm calling Ecosystems' Packages API:

Dependency Health lookups: hitting lookupPackage (via pURL), then calling out to Security Scorecards' API
Libyear lookups: hitting the lookupPackage (via pURL), then calling getRegistryPackageVersion

As noted in https://gitlab.com/tanna.dev/dependency-management-data/-/issues/459 there's a fair bit of a performance hit when running this.

I'm not currently performing any caching of anything, or have done anything my side other than send more concurrent requests - so I know there's definitely some stuff I can be doing to improve things!

But was also wondering if there was any appetite to request a subset of data (i.e. don't try and fetch repo metadata if it's going to be ignored) or allow sending a "bulk" lookup request so we can get back multiple packages in a single request.

I've recently got into the GraphQL hype for some data pieces, and feel that it could help simplify the amount of data that's required to fetch, especially if the consumer doesn't need it all.

I envision the start of the GraphQL API being to return exactly the data that we can do right now via the lookupPackage API, but allows us to unselect certain fields, allowing i.e. not lookup up advisories, registry, repo, etc unless requested.

Also interested to hear your thoughts, especially as it could be very much "I'm holding it wrong"

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

ecosyste-ms / packages

Consider exposing a GraphQL API / bulk lookup API? #651

Upvote & Fund