Add ERROR to the includeDatasetResponses enum

jrambla commented 5 years ago

Currently, if a dataset if returning an error it is only included in the ALL option. A user would probably expect that ALL is equal to HIT+MISS, but it is actually ALL=HIT+MISS+ERROR. I suggest to add such filtering option to improve comprehensiveness and clarity.

juhtornr commented 5 years ago

@jrambla I support the idea but because 1.1.1 milestone is already past due by 7 days, should we add this to 1.1.2?

teemukataja commented 5 years ago

Currently, if a dataset if returning an error it is only included in the ALL option.

In what case would a dataset return an error? Currently in Beacon we have errors in response for the service.

sdelatorrep commented 5 years ago

You're right, but maybe this could be improved. I mean, one of the errors that could happen is that the user sends an assemblyId and a datasetId which do not match (e.g. assemblyId=GRCh37 but the dataset uses GRCh38), the Beacon would answer with an error in BeaconAlleleResponse. But, if the user is requesting several datasets and some of them do use GRCh37 I think the Beacon could reply back with the information for these datasets and leave out the one in error (it could be retrieved using the ERROR filter, in this case the error would appear in BeaconDatasetAlleleResponse and not in BeaconAlleleResponse). Regarding to this, we have an error field inside BeaconDatasetAlleleResponse which will never be used if we follow the explanation here so I think we should reconsider this. @jrambla , WDYT?

sdelatorrep commented 5 years ago

Example of the response when there are 2 datasets and one of them has detected an error:

[
  {
    "beaconId": "some-beacon",
    "apiVersion": "v1.0.1",
    "exists": true,
    "alleleRequest": {
      "referenceName": "1",
      "start": 1111,
      "end": null,
      "startMin": null,
      "startMax": null,
      "endMin": null,
      "endMax": null,
      "referenceBases": "A",
      "alternateBases": "C",
      "variantType": null,
      "assemblyId": "GRCh38",
      "datasetIds": [
        "1","2"
      ],
      "includeDatasetResponses": "ALL"
    },
    "datasetAlleleResponses": [
      {
        "datasetId": "1",
        "exists": true,
        "error": null,
        "frequency": 0.002222,
        "variantCount": 1,
        "callCount": 4,
        "sampleCount": 1,
        "note": null,
        "externalUrl": null,
        "info": null
      },
      {
        "datasetId": "2",
        "exists": null,
        "error": {
          "errorCode": "400",
          "errorMessage": "User provided assemblyId (GRCh38) does not match with dataset assembly (GRCh37)"
        },
        "frequency": null,
        "variantCount": null,
        "callCount": null,
        "sampleCount": null,
        "note": null,
        "externalUrl": null,
        "info": null
      }
    ],
    "error": null
  }
]

teemukataja commented 5 years ago

I can see your point, but if the user is looking for start: 1111 on coordinate system of GRCh38 then simply changing assembly to GRCh37 won't give the user correct results as it is using a different coordinate system. This should simply render as a MISS response (exists: false), because the Beacon didn't have what the user requested.

[
  {
    "beaconId": "some-beacon",
    "apiVersion": "v1.0.1",
    "exists": true,
    "alleleRequest": {
      "referenceName": "1",
      "start": 1111,
      "end": null,
      "startMin": null,
      "startMax": null,
      "endMin": null,
      "endMax": null,
      "referenceBases": "A",
      "alternateBases": "C",
      "variantType": null,
      "assemblyId": "GRCh38",
      "datasetIds": [
        "1","2"
      ],
      "includeDatasetResponses": "ALL"
    },
    "datasetAlleleResponses": [
      {
        "datasetId": "1",
        "exists": true,
        "error": null,
        "frequency": 0.002222,
        "variantCount": 1,
        "callCount": 4,
        "sampleCount": 1,
        "note": null,
        "externalUrl": null,
        "info": null
      },
      {
        "datasetId": "2",
        "exists": false,
        "error": null,
        "frequency": null,
        "variantCount": null,
        "callCount": null,
        "sampleCount": null,
        "note": null,
        "externalUrl": null,
        "info": null
      }
    ],
    "error": null
  }
]

sdelatorrep commented 5 years ago

mmm I don't agree :P If the user knows the assemblies do not match, they can choose to convert the start position to the other coordinate system. If they just get exists:false they will think this variant does not exist but this may not be true.

blankdots commented 5 years ago

Trying to see if I understood it correctly in order to properly implement it:

Assumption 1: it seems that this error will appear only if the datasetId or assemblyId are not found, in the queried beacon, in all other cases there is MISS e.g. referenceBase is wrong, alternateBase,variantType or start are wrong
Assumption 2: error in BeaconDatasetAlleleResponse is redundant if exists is true or false, and will never be used
Assumption 3: if one of two datasets errors, one will have e.g. errorCode: 400 and the other one retrieve useful information but the HTTP response code will the 200 OK

Questions:

There is a request for two datasets ['1', '2'], but datasetId does not exist what should it look like, at which level the error key is displayed? assuming the includeDatasetResponses is ALL
Should it there be a 404 instead of 400 bad request, as the request is correct and validated by the required regex?

sdelatorrep commented 5 years ago

I think that if you request a dataset which does not exist, you should fill the top level error in BeaconAlleleResponse and nothing else.

[
  {
    "beaconId": "some-beacon",
    "apiVersion": "v1.0.1",
    "exists": null,
    "alleleRequest": {
      "referenceName": "1",
      "start": 1111,
      "end": null,
      "startMin": null,
      "startMax": null,
      "endMin": null,
      "endMax": null,
      "referenceBases": "A",
      "alternateBases": "C",
      "variantType": null,
      "assemblyId": "GRCh38",
      "datasetIds": [
        "1","2","invented_dataset_001"
      ],
      "includeDatasetResponses": "ALL"
    },
    "datasetAlleleResponses": null,
    "error": {
          "errorCode": "404",
          "errorMessage": "Dataset invented_dataset_001 does not exist"
        }
  }
]

And if the parameters are valid but there is an incompatibility among them (in my example, between the assembly of one of the datasets and the requested assembly) then you should fill the error in BeaconDatasetAlleleResponse and return other information you can provide (my example above).

Also, I think you're right and if the assemblyId does not exist in this Beacon, you should answer with a 404 error.

blankdots commented 5 years ago

I see, then if the responses you illustrated above are to implemented, maybe some suggestions to consider:

error in BeaconDatasetAlleleResponse object should not contain an errorCode, because that error code will not reflect the actual HTTP status code of the whole response, and it will generate some confusion. In your example above only one of the datasets has 400 but the HTTP status code of the Response should be 200 because it is a correct response and no error occurred when providing the body of the response - I was thinking of 206 or 207, but not sure those can be utilized in that case
it seems that similar logic should be reflected and made clear in the specification with the CONTROLLED, REGISTERED and OPEN. As an example there is a request (for completion let us say it is with a Auth token provided) for datasetIds: ['1', '2', '3'], but dataset '3' is CONTROLLED and token does not provide access to it, then the answer should be as in the this example that you gave and of course with the 403 status code and errorCode, and no other dataset should be displayed - although this would be unintuitive because in spite of the CONTROLLED dataset, all should have access to OPEN and the beacon implementation should return the OPEN results in BeaconDatasetAlleleResponse

sdelatorrep commented 5 years ago

Regarding the 1st bullet point: I'm not sure I understood you. The HTTP status code will always be 200 unless there is an error while doing the call (3xx) or there is a server error (5xx). I mean, if you do a curl the HTTP status of the response will be 200 (unless a 3xx o 5xx error) and the Beacon specific errors will always be encoded in the error field (but the HTTP status will still be 200).

Regarding the 2nd bullet point: I agree with you but I'm not sure how we should proceed. I guess we should discuss this in the next technical call tomorrow.

jrambla commented 5 years ago

The principals about error returning in Beacon are as follows:

The errors at HTTP response are just for HTTP conversations between the client and the server. Inside the Beacon responses, HTTP errors are leveraged conceptually, not as HTTP operations by themselves (e.g. a 404 doesn't means "page not found" by "resource not found")
The error in the BeaconAlleleResponse is meant for errors related to a) a simple query (without dataset details) or b) a general error when dataset details are requested (e.g. a mal formed request or an overall non-Authorized operation)
the error in the BeaconDatasetAlleleResponse is for error affecting just that dataset (e.g. non-applicable query due to a different assembly or dataset under controlled access and user not being authenticated or not having permissions for it).
In order to make the dialog between the client and the server as efficient and smooth as possible: one error at dataset level SHOULD NOT invalidate the overall response, as this will result in plenty of "non-resolvable" queries. The example above from Stephan should be resolved like that: HTTP return code: 200 Beacon error level: 200 Dataset error for Open dataset: 200 (or null, but I rather prefer 200) Dataset error for Controlled dataset: 401 or 403 depending on the case

Thanks to recent conversations about the errors, I've realized that "BeaconError" is still the original one, and it is not well suited for the more granular response (i.e. BeaconDatasetAlleleResponse). Thus we need to "upgrade" it. Let's do it in another issue, however.

Hope this clarifies.

blankdots commented 5 years ago

@sdelatorrep you are right, I should have explained 1st bullet point better: In https://github.com/ga4gh-beacon/specification/blob/master/beacon.md#errors it is specified:

The error type SHOULD be chosen from this table and be accompanied by the specified HTTP status code.

How I interpret that considering the example above is that if there is an error in one of the datasets from BeaconDatasetAlleleResponse there should be an errorCode, but according to the specification that error code should be reflected in the HTTP status response (and following the example that means the HTTP status should be 400 instead of 200, but the other datasets answered without any error) - this makes it confusing what HTTP status should be if there is an error in the one of the datasets.

An simple way to eliminate this confusion would be to remove the required errorCode from the BeaconDatasetAlleleResponse and just keep it for BeaconAlleleResponse.

2nd bullet point: unfortunately business travel this week, but maybe there are other time slots for such a discussion.

blankdots commented 5 years ago

@jrambla I am sorry this is a bit confusing to me and how any services making use of the Beacon API should process these:

HTTP return code: 200 Beacon error level: 200 Dataset error for Open dataset: 200 (or null, but I rather prefer 200) Dataset error for Controlled dataset: 401 or 403 depending on the case

Confusing because the Beacon specification does not specify a 200 error: https://github.com/ga4gh-beacon/specification/blob/master/beacon.md#errors and should 200 be an error level ?

... and if something creates confusion maybe it should not be in any specification ;) - as I mentioned above, maybe it is is easier not to reflect an HTTP code in that required errorCode and simply remove it.

jrambla commented 5 years ago

In my understanding the text you refer (and that I always forget as I look directly into the OpenAPI spec) is misleading and probably should be considered obsolete. I've created another issue (#262) to do all this changes and clarifications. The BeaconError leverages the HTTP status codes instead of creating new ones, following the REST tradition, but the actual Beacon return code is independent of the HTTP conversation.

In the current Beacon spec of the art, my example above must be;

HTTP return code: 200 (in the HTTP response header) Beacon error level: null Dataset error for Open dataset: null Dataset error for Controlled dataset: 401 or 403 depending on the case

Hope this clarifies.

sdelatorrep commented 5 years ago

I created PR #267 which implements issue #262 and tries to clarify what we are discussing here.

blankdots commented 5 years ago

Sorry for the long message, but I wanted to detail my point of view.

@jrambla I was hoping that it was clear that using 200 as an error would be confusing as it conflicts with the HTTP 200 https://httpstatuses.com/200 - same code different semantics: Error and Success.

But maybe I missed something so I went over these past issues/decisions (I have not seen others):

In issue #170 it is mentioned:

IMHO we shouldn't tamper good answers because of tricky use cases. Lets respond with the positive/correct datasets and leave out the "not found" datasets.

and I like the proposed solution, without the error codes:

... my approach will be to add such warning as a descriptive message detailing "not found" dataset ids.

One of the use cases I could think of is having an aggregator for faulty/erroneous beacons, but in issue #207 it seems that ERROR was introduced to make distinction from errors related to dataset and MISS queries, so I would like to understand the comprehensive picture that is needed for such a feature to be added.

Actually I like the way it was designed in v0.3 where the errorCode was defined as a generic "numeric error code", and not as an error code that represents HTTP status. This means that there will be no confusion and there will be a need for custom errorCodes defined.

In v0.4 it seems that "numeric error code" was lost in translation, but not sure at which point and why the decision to use HTTP code in errorCode happened.

Probably it would be a better solution, in the long term, to use custom codes for dataset specific errors - meaning BeaconDatasetAlleleResponse, and beacon erros - meaning BeaconAlleleResponse - use what is in the current specification https://github.com/ga4gh-beacon/specification/blob/master/beacon.md#errors. e.g. for BeaconDatasetAlleleResponse use string error codes as they are more descriptive:

errorCode	errorMessage
DATASET_NOT_FOUND	Requested dataset `<requested_dataset>`, was not found in this beacon.
NO_PERMISSION_FOR_DATASET	Accessing `<requested_dataset>` is forbidden/not permitted.
ASSEMBLY_NOT_SUPPORTED	`<requested_dataset>` does not support `<requested_assembly>`.

snippet for BeaconDatasetAlleleResponse:

{
    "datasetAlleleResponses": [{
        "error": {
            "errorCode": "DATASET_NOT_FOUND",
            "errorMessage": "Requested dataset `<requested_dataset>`, was not found in this beacon."
        }
    }]
}

for BeaconAlleleResponse it could be like in the example https://github.com/ga4gh-beacon/specification/issues/254#issuecomment-455535940 above, but consider that the examples are not 100% following the specification (e.g. errorCode is integer not string).

EDIT: meant to write startMin instead of start in the examples below .. sry :disappointed: - this editor not meant for these kind of long messages.

@sdelatorrep I did not consider bringing this up, as this was not part of these issues, but with the PR you made https://github.com/ga4gh-beacon/specification/pull/267 I would like to point out that the examples you provided are not consistent with the OpenAPI type definitions. To understand the reasoning, we can focus on two properties startMin and error and using a schema to validate them in an online validator: https://www.jsonschemavalidator.net/ A schema to validate just those two properties (based on the OpenAPI beacon specification) would look like:

{
  "definitions": {},
  "type": "object",
  "properties": {
    "error": {
      "type": "object",
      "required": [
        "errorCode"
      ],
      "properties": {
        "errorCode": {
          "type": "integer"
        },
        "errorMessage": {
          "type": "string"
        }
      }
    },
      "startMin": {
          "type": "integer",
          "minimum": 0
        }
  }
}

Now if we use the examples you provided as an input JSON:

{
  "error": null
  "startMin": null
}

or

{
  "error": {}
  "startMin": null
}

you would see how those are not valid input, to better understand the why see issue: https://github.com/ga4gh-beacon/specification/issues/252

sdelatorrep commented 5 years ago

I totally missed this last comment. And I'm also confused, probably because we're having the very same discussion in different issues and PRs. I think we should continue in #267 (only!). BTW, my examples were manually built and I didn't expect them to be perfect. But this comment brought issue #252 to my attention which I didn't notice. It should be addressed soon ;)

ga4gh-beacon / specification

Add ERROR to the includeDatasetResponses enum #254