Endpoint patterns - Githubissues

mbaudis commented 4 years ago

Patterns

Shouldn't endpoints be specified by a pattern?

/individuals/{id}/biosamples
- "pattern": "^\/?individuals\/([^\/]+?)\/biosamples\/?$
/individuals/{id}
- "pattern": "^\/?individuals\/([^\/]+?)\/?$
/individuals
- "pattern": "^\/?individuals)\/?$

...could have quite different methods of disambiguation; without a pattern on the path such could be interpreted differently.

This comment does not concern the limitation of the allowed id values (though excluding path separators would be required as above).

Dataset in Endpoints

The use of id values in some of the paths basically requires the specification of a dataset, which then has to happen in the request.

Proposal:

expand endpoints by dataset value for such dataset-bound cases; this could then be used for parameter-free default calls to retrieve e.g. single biosamples
- /biosamples/{datasetId}/{id}/g_variants

OR

drop the endpoints containing the id parameter since they need additional specifications anyway

jrambla commented 4 years ago

I don't get the point, @mbaudis, the xyz/{id}/abc is typical in REST (and in OpenAPI?). Hence these is what we represent in the endpoints. Are you suggesting to change that approach?

mbaudis commented 4 years ago

@jrambla I just may not understand how the response should be structured. The current form requires that the response is in the form

{ 
  "datasetResponses": [
    { "dataset_1": [
      { "id": "variant_id_1234", "reference_bases": "A", ...},
      { "id": "variant_id_1235", "reference_bases": "CG", ...},
      ...
      ]
    },
    { "dataset_2": [ ...

... i.e. a multi-dataset response. If this is intentional it is o.k. (but needs documantation - it is not yet clear IMO that a g_variants endpoint delivers a multi-dataset response). AND this would be in an IMO logical conflict with an {id} query, which assumes a specific dataset (which isn't provided here - unless you fudge it in as query parameter).

Is this clearer?

(I have no "blocking" problem w/ any implementation - multi-dataset default, datasetId in query etc. - but this needs to be sorted out & documented, one way or another. My preferred solution would be a required single datasetId in the path, for these types of data retrieval calls).

jrambla commented 4 years ago

Oh, I understand. The construct /biosamples/{datasetId}/{id}/g_variants is not correct in REST terms. If you want to retrieve the variants of a biosample in a given dataset, you should do somethinglike that: /datasets/{id1}/biosamples/{id2}/g_variants

The Ids refer to the preceding entity, hence id1 is a dataset Id, while id2 is a biosample id.

Having said so, the only paths available are the ones in the specification. Therefore, we will have to eventually decide which are the logical ones. Also, it is not usual to have paths so long, and it is usually restricted to 1 level of details as: /datasets/{id1}/biosamples/{id2}/g_variants should actually be equivalent of /biosamples/{id2}/g_variants In case the biosample is in more than one dataset that assertion will not be true.

mbaudis commented 4 years ago

@jrambla Sorry; sure, you’re correct that I didn’t really REST pattern it; point remains, though - without datasetId it is not really clear what response to be expected. And also, for e.g. /biosamples/{id}/... you may have ambiguity not only in the response but in the query, too (since local ids are assumed & may clash between datasets). So:

require to specify a dataset for data delivery responses (biosamples, individuals, variants); most sensible option, IMO OR
define a response pattern where you always have the requested data as values of the datasets (as in my not-so-correct-but-maybe-explanatory code above (would be more in line with the standard Beacon datasets response but more messy, IMO)

sdelatorrep commented 3 years ago

What is the status of this discussion, @jrambla, @mbaudis?

mbaudis commented 3 years ago

@sdelatorrep @jrambla Well, your design decision if having the dataset in the path or the query. I just see that the specification of an id w/o the dataset requires parsing over all datasets. And in scenarios where "another dataset" may mean "another database / file" this may be ... cumbersome.

I understand that this isn't an issue for the base paths w/o id where you actually may specify multiple datasets (though this is another problematic option for other reasons).

(FYI, we have the paths now as in the current version, e.g. https://progenetix.org/beacon/biosamples/pgxbs-kftva5c8/g_variants/?datasetIds=progenetix but don't feel so "great" about this); I'll leave this to your judgement ...)

jrambla commented 3 years ago

@mbaudis the path /resource/{id} is a URI, therefore, the backend must ensure that collissions doesn't happen. If there is a chance of collision, there are two solutions:

To create a unique id for every resource. It could be what we call an "intelligent id" that could be parsed, like dataset1-resource1. This is not recommended in most scenarios, but I believe the one you describe shouldn't be an issue.
To light a different beacon for every dataset. Hence the path would be dataset1/beacon/biosamples/{id}

Hope this helps

ga4gh-beacon / specification-v2

Endpoint patterns #24

Patterns

Dataset in Endpoints