Open briandoconnor opened 4 years ago
Late to the party, but to fill in some Galaxy-centric details: there are AnVIL workspaces that contain tables with thousands of DRS URIs; sometimes there exists associated metadata within these tables Galaxy can use to pre-populate fields and delay resolution til the point of actual file acquisiton, sometimes there is not. To show the user the nature of the data they are browsing in the latter case (and not just a big guid), that will require thousands of individual HTTP connections to the resolver. It seems like batch resolution is a win for everyone re: infrastructure strain and user experience in any circumstance where there is more than one DRS URI to resolve.
Bulk resolution could as simple as a body containing a list of DRS URIs. If there is some concern about this being non conformant to the DRS spec (I vaguely recall this being mentioned when I brought it up in a call), perhaps implementing something like Google storage batching via multipart message types would allow the underlying system to operate on in individual basis while still providing the time / bandwidth savings. This might also be useful in situations where a system wants to aggregate requests from multiple users, with each part having its own auth token (Example: AnVIL users can click a button to resolve DRS URIs to see its metadata; if there were tens of thousands of concurrent users making these requests, the underlying systems could bundle all of their requests into multipart packets with individual auth tokens, sort the results, and shave off precious UI lag seconds).
These are just some thoughts; of course the best folks to formulate the correct approach are you all! Thanks for keeping this in your dev thought processes, I'm keen to see how this progresses and happy to answer any questions re: the Galaxy on AnVIL side of things.
this was discussed at the June 2021 FASP hackathon, see Section 2a of the hackathon notes. The main outcome was a strawperson gist developed by @mbarkley , @ianfore and others outlining the API endpoint and payload format to facilitate batch requests
See notes from the June 2021 FASP hackathon.
Goal
We want to have this merged into DRS 1.4 ahead of the fall 2023 Plenary
Background
Some DRS implementers/users have requested the ability to make DRS requests in bulk for multiple DRS URIs. For example, NHGRI AnVIL (Terra) and the use of Galaxy in that project, Gen3 (for multiple projects), and Velsera.
As of 5/22/23 we have the complete set of bulk endpoints for authorization information, DRS IDs, and DRS access methods. This PR does not include pagination nor does it include explicit pairing of passports to the output of a bulk response. See the PR for more info.
Feature Branch/PR
We made a PR #365