API endpoint URL design

nsheff commented 3 years ago

Spinoff from #21.

How should the API user interface be designed?

1. Compare endpoint

We decided the GET and POST should both use /compare somehow. Should it be:

/compare/:digestA/:digestB, or
/:digestA/compare/:digestB

The first seems to envision compare as a standalone function which takes 2 values. The second seems to envision compare as an operation that can be done on a particular seqcol, with a second one as the function parameter.

2. Sequence collection retrieval endpoints

Along these lines, how should the retrieval endpoints work? What is the name of the endpoint?

/:digest/:recursionLevel (e.g. /a6748aa0f6a1e165f871dbed5e54ba62/1 for level 1).
/seqcol/:digest/:recursionLevel
/retrieve/:digest/:recursionLevel,
or should recursionlevel go before digest, like /retrieve/:recursionLevel/:digest,

Are there any other endpoints that should be provided?

sveinugu commented 2 years ago

So my preference would be the "object-oriented" approach, i.e. option 2 for comparison. I believe this suggestion follows the intent of the original REST paper most closely. However, for that to make sense I believe the retrieve endpoint should start in the same manner (i.e. option 1 under retrieval). However, adding an endpoint 'seqcol' or similar in front of both might be clearer, i.e. a new suggestion /seqcol/digestA/compare/digestB for comparison and option 2 for retrieval. I also believe recursionLevel should come at the end of the URL path, the lack of which should default to (redirect to?) level 0. However, I'm fine with whatever the plurality wants.

sveinugu commented 2 years ago

Would it be an idea to add another optional endpoint for lookup of digests at a lower level, e.g. the digest of the lengths array only? This could support at least two potential operations:

a) Retrieval of the array contents.

Example:

/seqcol/lengths/fa47a1:
[26382827, ..., 4247]

Here: seqcol/sequences/fafa88 could work as/be redirected to a refget endpoint.

b) lookup to retrieve all registered seqcols that contain the array.

Example:

/seqcol/contains/lengths/fa47a1/:
['ac46bb', '74b33d']

This would then also work as a reverse lookup from refget sequence digest to seqcol digest.

nsheff commented 2 years ago

So my preference would be the "object-oriented" approach, i.e. option 2 for comparison. I believe this suggestion follows the intent of the original REST paper most closely. However, for that to make sense I believe the retrieve endpoint should start in the same manner (i.e. option 1 under retrieval). However, adding an endpoint 'seqcol' or similar in front of both might be clearer, i.e. a new suggestion /seqcol/digestA/compare/digestB for comparison and option 2 for retrieval. I also believe recursionLevel should come at the end of the URL path, the lack of which should default to (redirect to?) level 0. However, I'm fine with whatever the plurality wants.

I think if we want to follow REST strictly (and I'm not necessarily saying we should), wouldn't it be /comparison/:digestA/:digestB?

for a GET request, you're hitting an endpoint that names the noun of the thing you are retrieving. So I would think it would be:

GET /seqcol/:digest
GET /comparison/:digestA/:digestB

That said I don't at the moment have strong feelings that we necessarily need to stick to the REST "best practices" for naming endpoints (or really if such things are universally accepted anyway). To me, endpoints that are verbs (like compare) doesn't feel problematic, but maybe I just don't have enough experience with API design.

nsheff commented 2 years ago

Added proposal as an ADR here: https://github.com/ga4gh/seqcol-spec/pull/24

Edit: what this says is we use:

GET /collection/:digest
GET /comparison/:digestA/:digestB

nsheff commented 2 years ago

One question I have is: Should the /collection endpoint only be allowed to take the primary digest? Or should the 'level 1 digests' (the array-level digests) also be serviced by that endpoint?

nsheff commented 2 years ago

One question I have is: Should the /collection endpoint only be allowed to take the primary digest? Or should the 'level 1 digests' (the array-level digests) also be serviced by that endpoint?

Any thoughts here, @tcezard @andrewyatz , others?

andrewyatz commented 2 years ago

Apologies didn't see this at first. So /collection meaning the top level get me this collection information? My gut says this should only take the top-level digest in since the array level digests don't represent collections. Would supporting arrays make the payloads coming back from the endpoint difficult to predict?

tcezard commented 2 years ago

I don't think /collection should accept level 1 digest since these digests do not represent collections most of the time. A separate endpoint could provide this but I would argue that we should wait and see if the demand exists first.

sveinugu commented 2 years ago

A possible syntax for level 1 digest resolution endpoints could be e.g. /collections/lengths/<digest>. This could be added as optional endpoints in the spec. I have a bit of a concern out of the FAIR principles (more specifically principle A1: https://www.go-fair.org/fair-principles/metadata-retrievable-identifier-standardised-communication-protocol/) with the idea of us defining identifiers (digests) for the arrays at level 1 without providing a standardised way to resolve such identifiers.

nsheff commented 2 years ago

Would supporting arrays make the payloads coming back from the endpoint difficult to predict?

Yes, this is a good point. it means the server would return things of different structure, depending on what the input was.

A possible syntax for level 1 digest resolution endpoints could be e.g. /collections/lengths/

Rather than /collection/lengths/<digest>, maybe it would be /array/<digest>, which is then universal for all level 1 digests? This seems to fit better with the current naming scheme

nsheff commented 2 years ago

The decision on endpoint structure was decided and documented in an ADR in PR #24

ga4gh / refget

API endpoint URL design #23

1. Compare endpoint

2. Sequence collection retrieval endpoints