ga4gh-beacon / specification

GA4GH Beacon specification.
Apache License 2.0
32 stars 25 forks source link

Add ERROR to the includeDatasetResponses enum #254

Open jrambla opened 5 years ago

jrambla commented 5 years ago

Currently, if a dataset if returning an error it is only included in the ALL option. A user would probably expect that ALL is equal to HIT+MISS, but it is actually ALL=HIT+MISS+ERROR. I suggest to add such filtering option to improve comprehensiveness and clarity.

juhtornr commented 5 years ago

@jrambla I support the idea but because 1.1.1 milestone is already past due by 7 days, should we add this to 1.1.2?

teemukataja commented 5 years ago

Currently, if a dataset if returning an error it is only included in the ALL option.

In what case would a dataset return an error? Currently in Beacon we have errors in response for the service.

sdelatorrep commented 5 years ago

You're right, but maybe this could be improved. I mean, one of the errors that could happen is that the user sends an assemblyId and a datasetId which do not match (e.g. assemblyId=GRCh37 but the dataset uses GRCh38), the Beacon would answer with an error in BeaconAlleleResponse. But, if the user is requesting several datasets and some of them do use GRCh37 I think the Beacon could reply back with the information for these datasets and leave out the one in error (it could be retrieved using the ERROR filter, in this case the error would appear in BeaconDatasetAlleleResponse and not in BeaconAlleleResponse). Regarding to this, we have an error field inside BeaconDatasetAlleleResponse which will never be used if we follow the explanation here so I think we should reconsider this. @jrambla , WDYT?

sdelatorrep commented 5 years ago

Example of the response when there are 2 datasets and one of them has detected an error:

[
  {
    "beaconId": "some-beacon",
    "apiVersion": "v1.0.1",
    "exists": true,
    "alleleRequest": {
      "referenceName": "1",
      "start": 1111,
      "end": null,
      "startMin": null,
      "startMax": null,
      "endMin": null,
      "endMax": null,
      "referenceBases": "A",
      "alternateBases": "C",
      "variantType": null,
      "assemblyId": "GRCh38",
      "datasetIds": [
        "1","2"
      ],
      "includeDatasetResponses": "ALL"
    },
    "datasetAlleleResponses": [
      {
        "datasetId": "1",
        "exists": true,
        "error": null,
        "frequency": 0.002222,
        "variantCount": 1,
        "callCount": 4,
        "sampleCount": 1,
        "note": null,
        "externalUrl": null,
        "info": null
      },
      {
        "datasetId": "2",
        "exists": null,
        "error": {
          "errorCode": "400",
          "errorMessage": "User provided assemblyId (GRCh38) does not match with dataset assembly (GRCh37)"
        },
        "frequency": null,
        "variantCount": null,
        "callCount": null,
        "sampleCount": null,
        "note": null,
        "externalUrl": null,
        "info": null
      }
    ],
    "error": null
  }
]
teemukataja commented 5 years ago

I can see your point, but if the user is looking for start: 1111 on coordinate system of GRCh38 then simply changing assembly to GRCh37 won't give the user correct results as it is using a different coordinate system. This should simply render as a MISS response (exists: false), because the Beacon didn't have what the user requested.

[
  {
    "beaconId": "some-beacon",
    "apiVersion": "v1.0.1",
    "exists": true,
    "alleleRequest": {
      "referenceName": "1",
      "start": 1111,
      "end": null,
      "startMin": null,
      "startMax": null,
      "endMin": null,
      "endMax": null,
      "referenceBases": "A",
      "alternateBases": "C",
      "variantType": null,
      "assemblyId": "GRCh38",
      "datasetIds": [
        "1","2"
      ],
      "includeDatasetResponses": "ALL"
    },
    "datasetAlleleResponses": [
      {
        "datasetId": "1",
        "exists": true,
        "error": null,
        "frequency": 0.002222,
        "variantCount": 1,
        "callCount": 4,
        "sampleCount": 1,
        "note": null,
        "externalUrl": null,
        "info": null
      },
      {
        "datasetId": "2",
        "exists": false,
        "error": null,
        "frequency": null,
        "variantCount": null,
        "callCount": null,
        "sampleCount": null,
        "note": null,
        "externalUrl": null,
        "info": null
      }
    ],
    "error": null
  }
]
sdelatorrep commented 5 years ago

mmm I don't agree :P If the user knows the assemblies do not match, they can choose to convert the start position to the other coordinate system. If they just get exists:false they will think this variant does not exist but this may not be true.

blankdots commented 5 years ago

Trying to see if I understood it correctly in order to properly implement it:

Questions:

  1. There is a request for two datasets ['1', '2'], but datasetId does not exist what should it look like, at which level the error key is displayed? assuming the includeDatasetResponses is ALL
  2. Should it there be a 404 instead of 400 bad request, as the request is correct and validated by the required regex?
sdelatorrep commented 5 years ago

I think that if you request a dataset which does not exist, you should fill the top level error in BeaconAlleleResponse and nothing else.

[
  {
    "beaconId": "some-beacon",
    "apiVersion": "v1.0.1",
    "exists": null,
    "alleleRequest": {
      "referenceName": "1",
      "start": 1111,
      "end": null,
      "startMin": null,
      "startMax": null,
      "endMin": null,
      "endMax": null,
      "referenceBases": "A",
      "alternateBases": "C",
      "variantType": null,
      "assemblyId": "GRCh38",
      "datasetIds": [
        "1","2","invented_dataset_001"
      ],
      "includeDatasetResponses": "ALL"
    },
    "datasetAlleleResponses": null,
    "error": {
          "errorCode": "404",
          "errorMessage": "Dataset invented_dataset_001 does not exist"
        }
  }
]

And if the parameters are valid but there is an incompatibility among them (in my example, between the assembly of one of the datasets and the requested assembly) then you should fill the error in BeaconDatasetAlleleResponse and return other information you can provide (my example above).

Also, I think you're right and if the assemblyId does not exist in this Beacon, you should answer with a 404 error.

blankdots commented 5 years ago

I see, then if the responses you illustrated above are to implemented, maybe some suggestions to consider:

sdelatorrep commented 5 years ago

Regarding the 1st bullet point: I'm not sure I understood you. The HTTP status code will always be 200 unless there is an error while doing the call (3xx) or there is a server error (5xx). I mean, if you do a curl the HTTP status of the response will be 200 (unless a 3xx o 5xx error) and the Beacon specific errors will always be encoded in the error field (but the HTTP status will still be 200).

Regarding the 2nd bullet point: I agree with you but I'm not sure how we should proceed. I guess we should discuss this in the next technical call tomorrow.

jrambla commented 5 years ago

The principals about error returning in Beacon are as follows:

Thanks to recent conversations about the errors, I've realized that "BeaconError" is still the original one, and it is not well suited for the more granular response (i.e. BeaconDatasetAlleleResponse). Thus we need to "upgrade" it. Let's do it in another issue, however.

Hope this clarifies.

blankdots commented 5 years ago

@sdelatorrep you are right, I should have explained 1st bullet point better: In https://github.com/ga4gh-beacon/specification/blob/master/beacon.md#errors it is specified:

The error type SHOULD be chosen from this table and be accompanied by the specified HTTP status code.

How I interpret that considering the example above is that if there is an error in one of the datasets from BeaconDatasetAlleleResponse there should be an errorCode, but according to the specification that error code should be reflected in the HTTP status response (and following the example that means the HTTP status should be 400 instead of 200, but the other datasets answered without any error) - this makes it confusing what HTTP status should be if there is an error in the one of the datasets.

An simple way to eliminate this confusion would be to remove the required errorCode from the BeaconDatasetAlleleResponse and just keep it for BeaconAlleleResponse.

2nd bullet point: unfortunately business travel this week, but maybe there are other time slots for such a discussion.

blankdots commented 5 years ago

@jrambla I am sorry this is a bit confusing to me and how any services making use of the Beacon API should process these:

HTTP return code: 200 Beacon error level: 200 Dataset error for Open dataset: 200 (or null, but I rather prefer 200) Dataset error for Controlled dataset: 401 or 403 depending on the case

Confusing because the Beacon specification does not specify a 200 error: https://github.com/ga4gh-beacon/specification/blob/master/beacon.md#errors and should 200 be an error level ?

... and if something creates confusion maybe it should not be in any specification ;) - as I mentioned above, maybe it is is easier not to reflect an HTTP code in that required errorCode and simply remove it.

jrambla commented 5 years ago

In my understanding the text you refer (and that I always forget as I look directly into the OpenAPI spec) is misleading and probably should be considered obsolete. I've created another issue (#262) to do all this changes and clarifications. The BeaconError leverages the HTTP status codes instead of creating new ones, following the REST tradition, but the actual Beacon return code is independent of the HTTP conversation.

In the current Beacon spec of the art, my example above must be;

HTTP return code: 200 (in the HTTP response header) Beacon error level: null Dataset error for Open dataset: null Dataset error for Controlled dataset: 401 or 403 depending on the case

Hope this clarifies.

sdelatorrep commented 5 years ago

I created PR #267 which implements issue #262 and tries to clarify what we are discussing here.

blankdots commented 5 years ago

Sorry for the long message, but I wanted to detail my point of view.

@jrambla I was hoping that it was clear that using 200 as an error would be confusing as it conflicts with the HTTP 200 https://httpstatuses.com/200 - same code different semantics: Error and Success.

But maybe I missed something so I went over these past issues/decisions (I have not seen others):

In issue #170 it is mentioned:

IMHO we shouldn't tamper good answers because of tricky use cases. Lets respond with the positive/correct datasets and leave out the "not found" datasets.

and I like the proposed solution, without the error codes:

... my approach will be to add such warning as a descriptive message detailing "not found" dataset ids.

One of the use cases I could think of is having an aggregator for faulty/erroneous beacons, but in issue #207 it seems that ERROR was introduced to make distinction from errors related to dataset and MISS queries, so I would like to understand the comprehensive picture that is needed for such a feature to be added.

Actually I like the way it was designed in v0.3 where the errorCode was defined as a generic "numeric error code", and not as an error code that represents HTTP status. This means that there will be no confusion and there will be a need for custom errorCodes defined.

In v0.4 it seems that "numeric error code" was lost in translation, but not sure at which point and why the decision to use HTTP code in errorCode happened.


Probably it would be a better solution, in the long term, to use custom codes for dataset specific errors - meaning BeaconDatasetAlleleResponse, and beacon erros - meaning BeaconAlleleResponse - use what is in the current specification https://github.com/ga4gh-beacon/specification/blob/master/beacon.md#errors. e.g. for BeaconDatasetAlleleResponse use string error codes as they are more descriptive:

errorCode errorMessage
DATASET_NOT_FOUND Requested dataset <requested_dataset>, was not found in this beacon.
NO_PERMISSION_FOR_DATASET Accessing <requested_dataset> is forbidden/not permitted.
ASSEMBLY_NOT_SUPPORTED <requested_dataset> does not support <requested_assembly>.

snippet for BeaconDatasetAlleleResponse:

{
    "datasetAlleleResponses": [{
        "error": {
            "errorCode": "DATASET_NOT_FOUND",
            "errorMessage": "Requested dataset `<requested_dataset>`, was not found in this beacon."
        }
    }]
}

for BeaconAlleleResponse it could be like in the example https://github.com/ga4gh-beacon/specification/issues/254#issuecomment-455535940 above, but consider that the examples are not 100% following the specification (e.g. errorCode is integer not string).


EDIT: meant to write startMin instead of start in the examples below .. sry :disappointed: - this editor not meant for these kind of long messages.

@sdelatorrep I did not consider bringing this up, as this was not part of these issues, but with the PR you made https://github.com/ga4gh-beacon/specification/pull/267 I would like to point out that the examples you provided are not consistent with the OpenAPI type definitions. To understand the reasoning, we can focus on two properties startMin and error and using a schema to validate them in an online validator: https://www.jsonschemavalidator.net/ A schema to validate just those two properties (based on the OpenAPI beacon specification) would look like:

{
  "definitions": {},
  "type": "object",
  "properties": {
    "error": {
      "type": "object",
      "required": [
        "errorCode"
      ],
      "properties": {
        "errorCode": {
          "type": "integer"
        },
        "errorMessage": {
          "type": "string"
        }
      }
    },
      "startMin": {
          "type": "integer",
          "minimum": 0
        }
  }
}

Now if we use the examples you provided as an input JSON:

{
  "error": null
  "startMin": null
}

or

{
  "error": {}
  "startMin": null
}

you would see how those are not valid input, to better understand the why see issue: https://github.com/ga4gh-beacon/specification/issues/252

sdelatorrep commented 5 years ago

I totally missed this last comment. And I'm also confused, probably because we're having the very same discussion in different issues and PRs. I think we should continue in #267 (only!). BTW, my examples were manually built and I didn't expect them to be perfect. But this comment brought issue #252 to my attention which I didn't notice. It should be addressed soon ;)