Closed mbaudis closed 7 months ago
I am in doubts...
We are removing the "CollectionsResponse"
and modifying "ResultsOKResponse"
to be the collection.
What will happen with endpoint like?
/cohorts/{id}/individuals
Before changes, only the "root" endpoint was returning the collection an the rest "usual" ResultsOKResponse (like any other entryType e.g. "biosamples").
Are we changing the contract?
D.
@redmitry Oh well, you are right w/ this breaking the resultsetResponses if allowing those endpoints (which I guess we should). I guess the naming of ResultsOKResponse
makes this look special but is extremely confusing. I'll iterate another version for discussion...
Have a look at https://github.com/ga4gh-beacon/beacon-v2/pull/121/commits/ea4324b420bc3995be2d07dad980f124092492e4 Only for cohorts so far.
My reasoning there is that ResultsOKResponse
does not make sense as a name, especially if there are several possible responses.
But generally this all is also a big argument for ditching the concept of having separate Boolean and Count response types and just having a single one for each data response flavour (collections, resultsets, filtering terms ...) where we the data payload (and count) are optional... This is probably against some "how it should be done" rule and I remember that the different response schemas looked nice when drafting them; but now we see a confusion of alternative response types and alternative payload types.
Alas, for now how about this?
IMO "ResultsetsResponse" is more appropriate name. So we'll have "ResultsetsResponse" which is one of three responses (boolean, count, resultsets) and the "CollectionsResponse".
But resultSets response gives results splitted by dataset and if I'm not wrong, boolean and count responses don't do that, do they? On the other hand, I see with your last commits @mbaudis that cohorts have the resultsets response and datasets still have the resultsOkResponse.
Yes and I really would like to see this (since the previous generic ResultsOKResponse
is kind of confusing - at least it confused me; I thought it was a required term ...).
Now, in principle we should then propagate this everywhere we have such a generic to make it clearer what is being executed.
Pro: clarity, won't break anything since references inside of schemas. Con: change a parameter name just for these reasons, w/o any functional gain
I'm clearly pro here, but for all entry types.
@costero-e I left the datasets just since I didn't want to do changes I have to reverse ...
ResultSets are actually split by any type of collection:
setType:
description: Entry type of resultSet. It SHOULD MATCH an entry type declared
as collection in the Beacon configuration.
type: string
default: dataset
... so a datasets response for individuals could be for the dataset or split for its cohorts, theoretically.
I think you tagged another Oriol @mbaudis, jeje. Ok, thank you, it's good you are just giving an example using cohorts. On the other hand, what I meant is that we don't have resultSets in a typical Count or Boolean response:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"description": "Payload definition for the \"count\" response.",
"properties": {
"exists": {
"$ref": "../../common/beaconCommonComponents.json#/definitions/Exists"
},
"numTotalResults": {
"$ref": "../../common/beaconCommonComponents.json#/definitions/NumTotalResults",
"description": "Total number of results."
}
},
"required": [
"exists",
"numTotalResults"
],
"type": "object"
}
For me, putting the "countResponse" and "booleanResponse" under a "resultSetsresponse" makes it a bit confusing. That's why I'm more in favor of keeping "resultsOkResponse".
@costero-e It is confusing in any case. Therefore I would just prefer that we have only a ResultSetsResponse with a Boolean granularity instead of switching responses.
But overall the new way would be slightly less confusing... You need to name the response anyway and ResultsOKResponse
+ CollectionsResponse
is worse compared to ResultsetsResponse
+ CollectionsResponse
.
Anyway, naming is schema internal. And IMO it is correct to have an endpoint with a ResultsetsResponse
, which can take the format of one of the options. If not doing it for ResultsetsResponse
then you have additional inconsistencies.
But again, this is a reason for having a single response for and the granularity handled inside it.
... and another inconsistency is the "cohorts have individuals for records retrieval while datasets have all entry types". I understand the argument ("cohorts as a group of individuals") but still you may want to get the samples etc. Not wanting to change right now but good to keep in mind.
@redmitry @costero-e So WDYT - moving ahead in the current version w/o going for the other "ResultsOKResponse" instances, but doing it for the collections which have these 2 types of responses? I'd like to cloes https://github.com/ga4gh-beacon/beacon-v2/issues/116
I guess we should have the "response schemas for entry types should be defined somewhere" in a general "remove dependencies on OpenAPI" discussion/issue.
Again we are changing the contract:
was:
"/cohorts/{id}": {
"get": {
"responses": {
"200": {
"$ref": "#/components/responses/ResultsOKResponse"
},
now:
"/cohorts/{id}": {
"get": {
"responses": {
"200": {
"$ref": "#/components/responses/CollectionsResponse"
},
Most probably that was the error in the spec coz I remember reporting this to Oriol #98, but since I implemented my java implementation from spec.
Should we fix both the spec. and the implementation?
Cheers,
Dmitry
@redmitry This just changes the name ResultsOKResponse
=> CollectionsResponse
; the operation is getting a single collection (as it should).
I've fixed now the messy partial changes in the last commit (all the separate ones should be squashed...).
This fixes the wrong response for some of teh collections endpoints and changes the (definitely now) ambiguous ResultsOKResponse to the correct instances of
- CollectionsResponse
- ResultsetsResponse
This just changes the name ResultsOKResponse => CollectionsResponse; the operation is getting a single collection (as it should).
I do not doubt that it probably (😏) should, but the current spec's "ResultsOKResponse" is "oneOf": [ beaconBooleanResponse.json, beaconCountResponse.json, beaconResultsetsResponse.json ]
So that my comment: if we want the "/cohorts/{id}"
to return "/a single collection/" we have to change implementations also...
Dmitry
@redmitry That is actually in line with all entry types: biosamples etc. also only know the ResultsOKResponse
which is defined as
ResultsOKResponse:
description: Successful operation.
content:
application/json:
schema:
oneOf:
- $ref: https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/framework/json/responses/beaconBooleanResponse.json
- $ref: https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/framework/json/responses/beaconCountResponse.json
- $ref: https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/framework/json/responses/beaconResultsetsResponse.json
... so yes, a biosamples/{id}
response will return a beaconResultsetsResponse
where the single biosample is wrapped in results
(list) in resultSets
(list) in the response
(object).
One can argue that we should have a 3rd type of response which is an beaconIdrequestResponse
where the result
could be the single document (and the responseSummary
would be as of now).
How do you respond to {id}
requests?
How do you respond to {id} requests? Currently all /{entryType}/{} endpoints return ResultsetsResponse(s) even for "collection" entryTypes.
https://beacons.bsc.es/beacon/v2.0.0/cohorts/CINECA_synthetic_cohort_UK1 https://beacon-apis-demo.ega-archive.org/api/cohorts/CINECA_synthetic_cohort_EUROPE_UK1
@redmitry Ah; that's what you mean; i.e. the wrong response type (BeaconResultsetsResponse instead of BeaconCollectionsResponse). Yes, that should just be a single collection info. Alt least according to the description in the endpoints
files:
/cohorts/{id}:
parameters:
- $ref: '#/components/parameters/entryId'
get:
parameters:
- $ref: '#/components/parameters/requestedSchema'
description: Get details about one cohort, identified by its (unique) 'id'
... and
/datasets/{id}:
parameters:
- $ref: '#/components/parameters/entryId'
get:
parameters:
- $ref: '#/components/parameters/requestedSchema'
description: Get details about one dataset, identified by its (unique) 'id'
operationId: getOneDataset
tags:
... which is IMO the clear intention; in BeaconCollectionsResponse this would result in a response.results
list of one collection. (I didn't write this ¯\_(ツ)_/¯
).
My point above was already a bit further - a separate BeaconIdResponse
... or such which would move the list -> object, i.e. a single cohort or biosample ... in response.result
.
Agree @redmitry that this will change the implementations but anyway, I'm changing it almost every day for adding new features that are on demand so no problem. Plus, it makes more sense to have a collectionResponse rather than resultSetsResponse as @mbaudis says. For me, the PR is good to be merged and we can proceed.
So that. Once merged, I update java implementation in accordance.
Great! Note: I removed some schema .md files from the documentation since they always break merges on Mac OS (same name, different case). This doesn't affect anything since the documentation branch is kept separately, and the website is being built from it. IMO the schema -> .md -> web scripts should be redone (and rerun!) but that's for a separate discussion.
Without having read the whole thread in detail yet, as I'm trying to understand which issue we are trying to solve here... The original issue was described by @redmitry as
cohorts/endpoints.json and datasets/endpoints.json have CollectionsResponse response object derfined as a choice of beaconBooleanResponse, beaconCountResponse and beaconCollectionsResponse which is wrong.
Should be defined as beaconCollectionsResponse precising the type according the endpoint cohorts/defaultSchema.json or cohorts/defaultSchema.json.
The assertion "is wrong" needs to be clarified. What is exactly wrong in that? A collection response is different from a resultsets response.
@jrambla Yeah, well, this filtered down to the collections response in cohorts mapping to a resultsets response as the data option in the beaconOKresponse (in contrast to what was written everywhere else). But we need a resultset response, too, for the cohorts/{id}/individuals (and more in datasets) items. Unless we say they are also just a list since the cohort / dataset is already the wrapper ... But this just isn't clear; it was wrong in any case and the current change basically allows to have now resultsets for the records payloads (e.g. a /datasets/{id}/
endpoint could have individuals from mutiple cohorts resultsets).
Great! Note: I removed some schema .md files from the documentation since they always break merges on Mac OS (same name, different case). This doesn't affect anything since the documentation branch is kept separately, and the website is being built from it. IMO the schema -> .md -> web scripts should be redone (and rerun!) but that's for a separate discussion.
@mbaudis I'll have a look and rename the clashing .md files. My idea was to create a Gihtub action to run the scripts...but that was before we had the re-branching...
I fear this is incorrect. Cohorts/{id} and datasets/{id} should return details of THAT collection, NOT the contents of the collection. In contrast, cohorts/{id}/individuals will return a resultset with the individuals (simplyfying at this level of discussion)
The assertion "is wrong" needs to be clarified. What is exactly wrong in that? A collection response is different from a resultsets response.
Well.. I may probably messed it up... Thing you are right is that we have 2 thing here... my initial issue and the response from /cohorts/{id} I would pause it as Jordi suggest.
Yes, @jrambla is right that this is more a change than a fix and can wait. Maybe it needs more discussion on whether the solution proposed here is better than the one we had so I would also stop the PR until it is properly discussed.
I'm closing this - see fresh start in #123
This addresses the easy part of https://github.com/ga4gh-beacon/beacon-v2/issues/116 (collections responses w/ strange endpoints etc.)