ga4gh-beacon / beacon-v2

Unified repository for the GA4GH Beacon v2 API standard
Creative Commons Zero v1.0 Universal
27 stars 22 forks source link

Collections response fix endpoints #121

Closed mbaudis closed 7 months ago

mbaudis commented 8 months ago

This addresses the easy part of https://github.com/ga4gh-beacon/beacon-v2/issues/116 (collections responses w/ strange endpoints etc.)

redmitry commented 8 months ago

I am in doubts... We are removing the "CollectionsResponse" and modifying "ResultsOKResponse" to be the collection. What will happen with endpoint like? /cohorts/{id}/individuals

Before changes, only the "root" endpoint was returning the collection an the rest "usual" ResultsOKResponse (like any other entryType e.g. "biosamples").

Are we changing the contract?

D.

mbaudis commented 8 months ago

@redmitry Oh well, you are right w/ this breaking the resultsetResponses if allowing those endpoints (which I guess we should). I guess the naming of ResultsOKResponse makes this look special but is extremely confusing. I'll iterate another version for discussion...

mbaudis commented 8 months ago

Have a look at https://github.com/ga4gh-beacon/beacon-v2/pull/121/commits/ea4324b420bc3995be2d07dad980f124092492e4 Only for cohorts so far.

My reasoning there is that ResultsOKResponse does not make sense as a name, especially if there are several possible responses.

But generally this all is also a big argument for ditching the concept of having separate Boolean and Count response types and just having a single one for each data response flavour (collections, resultsets, filtering terms ...) where we the data payload (and count) are optional... This is probably against some "how it should be done" rule and I remember that the different response schemas looked nice when drafting them; but now we see a confusion of alternative response types and alternative payload types.

Alas, for now how about this?

redmitry commented 8 months ago

IMO "ResultsetsResponse" is more appropriate name. So we'll have "ResultsetsResponse" which is one of three responses (boolean, count, resultsets) and the "CollectionsResponse".

costero-e commented 8 months ago

But resultSets response gives results splitted by dataset and if I'm not wrong, boolean and count responses don't do that, do they? On the other hand, I see with your last commits @mbaudis that cohorts have the resultsets response and datasets still have the resultsOkResponse.

mbaudis commented 8 months ago

Yes and I really would like to see this (since the previous generic ResultsOKResponse is kind of confusing - at least it confused me; I thought it was a required term ...).

Now, in principle we should then propagate this everywhere we have such a generic to make it clearer what is being executed.

Pro: clarity, won't break anything since references inside of schemas. Con: change a parameter name just for these reasons, w/o any functional gain

I'm clearly pro here, but for all entry types.

mbaudis commented 8 months ago

@costero-e I left the datasets just since I didn't want to do changes I have to reverse ...

ResultSets are actually split by any type of collection:

      setType:
        description: Entry type of resultSet. It SHOULD MATCH an entry type declared
          as collection in the Beacon configuration.
        type: string
        default: dataset

... so a datasets response for individuals could be for the dataset or split for its cohorts, theoretically.

costero-e commented 8 months ago

I think you tagged another Oriol @mbaudis, jeje. Ok, thank you, it's good you are just giving an example using cohorts. On the other hand, what I meant is that we don't have resultSets in a typical Count or Boolean response:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "description": "Payload definition for the \"count\" response.",
    "properties": {
        "exists": {
            "$ref": "../../common/beaconCommonComponents.json#/definitions/Exists"
        },
        "numTotalResults": {
            "$ref": "../../common/beaconCommonComponents.json#/definitions/NumTotalResults",
            "description": "Total number of results."
        }
    },
    "required": [
        "exists",
        "numTotalResults"
    ],
    "type": "object"
}

For me, putting the "countResponse" and "booleanResponse" under a "resultSetsresponse" makes it a bit confusing. That's why I'm more in favor of keeping "resultsOkResponse".

mbaudis commented 8 months ago

@costero-e It is confusing in any case. Therefore I would just prefer that we have only a ResultSetsResponse with a Boolean granularity instead of switching responses.

But overall the new way would be slightly less confusing... You need to name the response anyway and ResultsOKResponse + CollectionsResponse is worse compared to ResultsetsResponse + CollectionsResponse.

Anyway, naming is schema internal. And IMO it is correct to have an endpoint with a ResultsetsResponse, which can take the format of one of the options. If not doing it for ResultsetsResponse then you have additional inconsistencies.

But again, this is a reason for having a single response for and the granularity handled inside it.

mbaudis commented 8 months ago

... and another inconsistency is the "cohorts have individuals for records retrieval while datasets have all entry types". I understand the argument ("cohorts as a group of individuals") but still you may want to get the samples etc. Not wanting to change right now but good to keep in mind.

mbaudis commented 7 months ago

@redmitry @costero-e So WDYT - moving ahead in the current version w/o going for the other "ResultsOKResponse" instances, but doing it for the collections which have these 2 types of responses? I'd like to cloes https://github.com/ga4gh-beacon/beacon-v2/issues/116

I guess we should have the "response schemas for entry types should be defined somewhere" in a general "remove dependencies on OpenAPI" discussion/issue.

redmitry commented 7 months ago

Again we are changing the contract:

was:

"/cohorts/{id}": {
            "get": {
                "responses": {
                    "200": {
                        "$ref": "#/components/responses/ResultsOKResponse"
                    },

now:

        "/cohorts/{id}": {
            "get": {
                "responses": {
                    "200": {
                        "$ref": "#/components/responses/CollectionsResponse"
                    },

Most probably that was the error in the spec coz I remember reporting this to Oriol #98, but since I implemented my java implementation from spec.

Should we fix both the spec. and the implementation?

Cheers,

Dmitry

mbaudis commented 7 months ago

@redmitry This just changes the name ResultsOKResponse => CollectionsResponse; the operation is getting a single collection (as it should).

I've fixed now the messy partial changes in the last commit (all the separate ones should be squashed...).

This fixes the wrong response for some of teh collections endpoints and changes the (definitely now) ambiguous ResultsOKResponse to the correct instances of

  • CollectionsResponse
  • ResultsetsResponse
redmitry commented 7 months ago

This just changes the name ResultsOKResponse => CollectionsResponse; the operation is getting a single collection (as it should).

I do not doubt that it probably (😏) should, but the current spec's "ResultsOKResponse" is "oneOf": [ beaconBooleanResponse.json, beaconCountResponse.json, beaconResultsetsResponse.json ]

So that my comment: if we want the "/cohorts/{id}" to return "/a single collection/" we have to change implementations also...

Dmitry

mbaudis commented 7 months ago

@redmitry That is actually in line with all entry types: biosamples etc. also only know the ResultsOKResponse which is defined as

    ResultsOKResponse:
      description: Successful operation.
      content:
        application/json:
          schema:
            oneOf:
              - $ref: https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/framework/json/responses/beaconBooleanResponse.json
              - $ref: https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/framework/json/responses/beaconCountResponse.json
              - $ref: https://raw.githubusercontent.com/ga4gh-beacon/beacon-v2/main/framework/json/responses/beaconResultsetsResponse.json

... so yes, a biosamples/{id} response will return a beaconResultsetsResponse where the single biosample is wrapped in results (list) in resultSets (list) in the response (object).

One can argue that we should have a 3rd type of response which is an beaconIdrequestResponse where the result could be the single document (and the responseSummary would be as of now).

How do you respond to {id} requests?

redmitry commented 7 months ago

How do you respond to {id} requests? Currently all /{entryType}/{} endpoints return ResultsetsResponse(s) even for "collection" entryTypes.

https://beacons.bsc.es/beacon/v2.0.0/cohorts/CINECA_synthetic_cohort_UK1 https://beacon-apis-demo.ega-archive.org/api/cohorts/CINECA_synthetic_cohort_EUROPE_UK1

mbaudis commented 7 months ago

@redmitry Ah; that's what you mean; i.e. the wrong response type (BeaconResultsetsResponse instead of BeaconCollectionsResponse). Yes, that should just be a single collection info. Alt least according to the description in the endpoints files:

  /cohorts/{id}:
    parameters:
      - $ref: '#/components/parameters/entryId'
    get:
      parameters:
        - $ref: '#/components/parameters/requestedSchema'
      description: Get details about one cohort, identified by its (unique) 'id'

... and

  /datasets/{id}:
    parameters:
      - $ref: '#/components/parameters/entryId'
    get:
      parameters:
        - $ref: '#/components/parameters/requestedSchema'
      description: Get details about one dataset, identified by its (unique) 'id'
      operationId: getOneDataset
      tags:

... which is IMO the clear intention; in BeaconCollectionsResponse this would result in a response.results list of one collection. (I didn't write this ¯\_(ツ)_/¯).

My point above was already a bit further - a separate BeaconIdResponse ... or such which would move the list -> object, i.e. a single cohort or biosample ... in response.result.

costero-e commented 7 months ago

Agree @redmitry that this will change the implementations but anyway, I'm changing it almost every day for adding new features that are on demand so no problem. Plus, it makes more sense to have a collectionResponse rather than resultSetsResponse as @mbaudis says. For me, the PR is good to be merged and we can proceed.

redmitry commented 7 months ago

So that. Once merged, I update java implementation in accordance.

mbaudis commented 7 months ago

Great! Note: I removed some schema .md files from the documentation since they always break merges on Mac OS (same name, different case). This doesn't affect anything since the documentation branch is kept separately, and the website is being built from it. IMO the schema -> .md -> web scripts should be redone (and rerun!) but that's for a separate discussion.

jrambla commented 7 months ago

Without having read the whole thread in detail yet, as I'm trying to understand which issue we are trying to solve here... The original issue was described by @redmitry as

cohorts/endpoints.json and datasets/endpoints.json have CollectionsResponse response object derfined as a choice of beaconBooleanResponse, beaconCountResponse and beaconCollectionsResponse which is wrong.

Should be defined as beaconCollectionsResponse precising the type according the endpoint cohorts/defaultSchema.json or cohorts/defaultSchema.json.

The assertion "is wrong" needs to be clarified. What is exactly wrong in that? A collection response is different from a resultsets response.

mbaudis commented 7 months ago

@jrambla Yeah, well, this filtered down to the collections response in cohorts mapping to a resultsets response as the data option in the beaconOKresponse (in contrast to what was written everywhere else). But we need a resultset response, too, for the cohorts/{id}/individuals (and more in datasets) items. Unless we say they are also just a list since the cohort / dataset is already the wrapper ... But this just isn't clear; it was wrong in any case and the current change basically allows to have now resultsets for the records payloads (e.g. a /datasets/{id}/ endpoint could have individuals from mutiple cohorts resultsets).

mrueda commented 7 months ago

Great! Note: I removed some schema .md files from the documentation since they always break merges on Mac OS (same name, different case). This doesn't affect anything since the documentation branch is kept separately, and the website is being built from it. IMO the schema -> .md -> web scripts should be redone (and rerun!) but that's for a separate discussion.

@mbaudis I'll have a look and rename the clashing .md files. My idea was to create a Gihtub action to run the scripts...but that was before we had the re-branching...

jrambla commented 7 months ago

I fear this is incorrect. Cohorts/{id} and datasets/{id} should return details of THAT collection, NOT the contents of the collection. In contrast, cohorts/{id}/individuals will return a resultset with the individuals (simplyfying at this level of discussion)

redmitry commented 7 months ago

The assertion "is wrong" needs to be clarified. What is exactly wrong in that? A collection response is different from a resultsets response.

Well.. I may probably messed it up... Thing you are right is that we have 2 thing here... my initial issue and the response from /cohorts/{id} I would pause it as Jordi suggest.

costero-e commented 7 months ago

Yes, @jrambla is right that this is more a change than a fix and can wait. Maybe it needs more discussion on whether the solution proposed here is better than the one we had so I would also stop the PR until it is properly discussed.

mbaudis commented 7 months ago

I'm closing this - see fresh start in #123