camaraproject / Commonalities

Repository to describe, develop, document and test the common guidelines and assets for CAMARA APIs
Apache License 2.0
10 stars 25 forks source link

Adopt RFC 7807 error responses to allow API consumers to be reminded of the API definition #31

Closed eric-murray closed 6 months ago

eric-murray commented 1 year ago

Problem description CAMARA error responses include a short message to give some indication of the specific error. For example:

{
  "status": 400,
  "code": "OUT_OF_RANGE",
  "message": "Invalid port ranges specified: devicePorts"
}

But this information may not be sufficient on its own to allow the client to resolve the error, particularly if the API behaviour has changed between different versions. Now that documentation is embedded into the API definition itself, these responses would be a good opportunity to remind the API consumer of the API definition that is currently being used by the API implementation they are interacting with (which may not be the most recent).

RFC 7807 allows URLs to be included in the type field. With a couple of additional property name changes (renaming code to title and message to detail) CAMARA error responses could be made RFC 7807 compliant.

For example:

{
   "type": "https://raw.githubusercontent.com/camaraproject/QualityOnDemand/release-0.8.1/code/API_definitions/qod-api.yaml"
   "status": 400,
   "title": "OUT_OF_RANGE",
   "detail": "Invalid port ranges specified: devicePorts"
}

Using this format, an explicit link to the API definition (in this case v0.8.1) can be provided to assist troubleshooting. This URL should remain stable for an API release.

Expected action An RFC 7807 compliant format as shown in the above example should be adopted for CAMARA error responses.

Additional context Many API client implementations will already be able to parse RFC 7807 compliant error responses.

shilpa-padgaonkar commented 1 year ago

Fine for me to add type. But some points to consider

eric-murray commented 1 year ago

Thanks for the comments @shilpa-padgaonkar

Additional views on this would be welcome.

eric-murray commented 1 year ago

So the solution for defining URI fragments in text/plain documents is defined in RFC 5147, and is to append #line to the URL, followed by the start and end line of the fragment, separated by commas. So, amending the above example:

{
   "type": "https://raw.githubusercontent.com/camaraproject/QualityOnDemand/release-0.8.1/code/API_definitions/qod-api.yaml#line312,342"
   "status": 400,
   "title": "OUT_OF_RANGE",
   "detail": "Invalid port ranges specified: devicePorts"
}

where now the type URL should now return only the PortsSpec schema.

Unfortunately, this notation does not appear to be supported by any web browsers :-(

RandyLevensalor commented 1 year ago

This looks good. As discussed on the call, allowing the flexibility for the API provider to provide the content which they see as best in the type field.

eric-murray commented 1 year ago

So one solution would be to hash-link to the github rather than raw view of the YAML. For example, we could have:

{
   "type": "https://github.com/camaraproject/QualityOnDemand/blob/release-0.8.1/code/API_definitions/qod-api.yaml#L312-L328"
   "status": 400,
   "title": "OUT_OF_RANGE",
   "detail": "Invalid port ranges specified: devicePorts"
}

So my proposal would be to add a suitable type field of this nature to each example error message that is included in an API specification (such as that above). API providers could then either just use the example error message verbatim for a given error, or provide a link to their own error documentation instead if they prefer.

As the hash-link still links to specific lines rather than sections, it will require some effort to keep these up to date as lines move around, but really this is something that only need be done as part of the final documentation update for a release.

Kevsy commented 1 year ago

I support this proposal: it also enables problem details that are common to multiple APIs to be shared by referencing the same type URI value.

patrice-conil commented 1 year ago

As a developer, I do not support this proposal because it complicates error handling too much. Maintaining the precision of the pointer to the documentation could be a real challenge and the risk is to always return the same link(global swagger) which increases the payload and does not provide any useful information. In addition, the detection of errors in format, type, etc. are generally delegated to the serialization/deserialization layer (GSON, Jackson, Moshi, etc.) and it is therefore difficult to customize errors, especially since microservices frameworks like spring (@ControllerAdvice) or quarkus (@ServerExceptionMapper) propose to centralize the error handling in a dedicated component which transforms the exceptions generated by the different layers of the software into well-formatted errors. IMHO I think it would be more useful to give consumers more usage examples of the APIs rather than pointers to the documentation.

eric-murray commented 1 year ago

Hi @patrice-conil

When you say using RFC 7807 "complicates error handling too much", do you mean for the API client, the API implementor, or both?

For the API implementor, my proposal was that error examples including a relevant type field be included in the API OAS file. So, yes, somebody will need to check each of these for each API release, but that only needs to be done once for each release, and I doubt we will have more than one or two releases per year per year, and hopefully less. So no challenge at all for implementors (i.e. developers) if they use the default error responses.

For the API client, I have to confess that I don't understand why you think using a standard error format (RFC 7807) is more challenging than using the proprietary CAMARA format.

Client developers who do not want to pay any attention to the type field can choose to ignore it. So, again, no challenge for developers who do not want a challenge. Whether error handling is centralised or handled by individual microservices appears to me to be irrelevant to whether the error format is standardised (RFC 7807) or proprietary (CAMARA). If anything, a centralised component should find it easier to handle a standardised error format.

Payload bloat is primarily an issue for happy path scenarios (2XX responses). Payload bloat for errors is much less of an issue, because fielding large numbers of 4XX error responses should always be a temporary issue. For 5XX errors where the client can do nothing about it, the type field can be omitted.

patrice-conil commented 1 year ago

Hi @eric-murray,

Hi @patrice-conil

When you say using RFC 7807 "complicates error handling too much", do you mean for the API client, the API implementor, or both?

Only for implementors. For the API implementor, my proposal was that error examples including a relevant type field be included in the API OAS file. So, yes, somebody will need to check each of these for each API release, but that only needs to be done once for each release, and I doubt we will have more than one or two releases per year per year, and hopefully less. So no challenge at all for implementors (i.e. developers) if they use the default error responses.

It's not a problem of format ... it's how to populate info when detection of error is delegated to the framework. If I always populate the "type" with same swagger spec ... what is the added value ?

For the API client, I have to confess that I don't understand why you think using a standard error format (RFC 7807) is more challenging than using the proprietary CAMARA format.

I agree with you, there is no challenge for API consumers

Client developers who do not want to pay any attention to the type field can choose to ignore it. So, again, no challenge for developers who do not want a challenge. Whether error handling is centralised or handled by individual microservices appears to me to be irrelevant to whether the error format is standardised (RFC 7807) or proprietary (CAMARA). If anything, a centralised component should find it easier to handle a standardised error format.

As said before ... challenge is not in the format of error.

Payload bloat is primarily an issue for happy path scenarios (2XX responses). Payload bloat for errors is much less of an issue, because fielding large numbers of 4XX error responses should always be a temporary issue. For 5XX errors where the client can do nothing about it, the type field can be omitted.

I agree with you on that point.

In fact, if you think we could always link back to global OAS, there's no challenge in using that format for errors, but I don't think it adds much value (other than adhering to a standard). If it's a question of being much more precise and pointing to a particular section of the specification as mentioned by @shilpa-padgaonkar , then there is a real challenge.

eric-murray commented 1 year ago

Hi @patrice-conil

OK, if I understand your concerns correctly, these are:

These are reasonable concern for an API implementor, but the reason we are doing all this is for the benefit of the API consumer. My view is that we should accept some implementation pain if this provides value to the API consumer, and I believe this does. When testing our implementations internally, a common complaint from API consumers is that they just do not understand what they have done wrong when constructing a request and getting a 4XX response. Some people learn by reading the documentation, whereas others try something to see what result they get.

So I am still in favour of this proposal as I think it does add value for API consumers (at least, for those "trying out" an API to see if it is of value to them), but I think a consensus needs to be reached within CAMARA on this, as it does imply some additional work for the CAMARA community.

patrice-conil commented 1 year ago

Hi @eric-murray ,

These are reasonable concern for an API implementor, but the reason we are doing all this is for the benefit of the API consumer. My view is that we should accept some implementation pain if this provides value to the API consumer, and I believe this does. When testing our implementations internally, a common complaint from API consumers is that they just do not understand what they have done wrong when constructing a request and getting a 4XX response. Some people learn by reading the documentation, whereas others try something to see what result they get.

IMHO, we need to invest more time on "giving the right error message" than "pointing on particular spec section". As an API consumer, I expect the API to tell me what my mistake is and not direct me to the document I've already read. This kind of Error, is what I expect ... so it's what I implemented in QoD.

{
    "status": 400,
    "code": "BAD_REQUEST",
    "message": "Cannot construct instance of `com.camara.model.RateUnitEnum`, problem: Unexpected value 'Kbps'"
}

But as said before, adhering to the RFC 7807 is not a problem for us. We will just always return pointer on global API definition.

eric-murray commented 1 year ago

Hi @patrice-conil

Sue, but the choice is not one or the other. A RFC 8707 error response can have both a message (or detail for RFC 8707) and type for a link to more detail if the error message itself does not clarify the problem. I am all in favour of providing a rich set of example error responses in the API definition to inspire API implementors, but this issue is primarily discussing the format through which such error messages can be provided to the API consumer.

So, to summarise the proposal:

So the only mandatory requirement for API implementors is that error responses have media type application/problem+json. Standardising and mandating error message content is not being proposed.

Kevsy commented 1 year ago

"message": "Cannot construct instance of com.camara.model.RateUnitEnum, problem: Unexpected value 'Kbps'"

Just to comment that the error detail I would expect as an API consumer would be:

"message": " Unexpected value 'Kbps'. Expected values are: 'bps' 'kbps' 'Mbps' 'Gbps' 'Tbps' "

...because that (1) tells me what to do next and (2) does not share internal implementation details (security principle of least privilege).

patrice-conil commented 1 year ago

@Kevsy, Yes your message would be better ... but mine is free using jackson as serializer/deserializer 😄

maxl2287 commented 11 months ago

@eric-murray: I am also okay with RFC 7807 and having the type as optional.

I am also coming from the developer POV and see troubles in having links to the schema in the error-response (but when type is optional, we do not have to take care about)

I am also interested that the message would communicate to the Provider, what the exact error is rather than linking to the schema (like @Kevsy reported as well).

example

{
   "type": "https://github.com/camaraproject/QualityOnDemand/blob/release-0.8.1/code/API_definitions/qod-api.yaml#L312-L328"
   "status": 400,
   "title": "INVALID_INPUT",
   "detail": "Invalid input provided: The msisdn should adhere to the pattern '^+?[0-9]{5,15}$'"
}

this would help a lot more than:

{
   "type": "https://github.com/camaraproject/QualityOnDemand/blob/release-0.8.1/code/API_definitions/qod-api.yaml#L312-L328"
   "status": 400,
   "title": "INVALID_INPUT",
   "detail": "Schema validation failed at msisdn"
}
RubenBG7 commented 11 months ago

From my prospective, errors raised by an API have not the goal of solve bugs with the integration. Errors goal are to give information of what happend with the request made to the consumer.

When a developer needs to follow what happend in the integration layers you should use correlators, surveillance methods, logs or whatever you need, but the errors shall not be used to give this kind of information.

eric-murray commented 11 months ago

The agreement made during a recent meeting was to refer this proposal to the End User Council, who should be able to come to a common view on whether this is useful to developers or not. Unfortunately I've been so busy with other things I've not been able to pursue this, but will try to do that in the near future.

patrice-conil commented 11 months ago

Hi @RubenBG7, @eric-murray

From my prospective, errors raised by an API have not the goal of solve bugs with the integration. Errors goal are to give information of what happend with the request made to the consumer.

The goal is not solve bugs with the integration but if it could help during integration why not?

When a developer needs to follow what happend in the integration layers you should use correlators, surveillance methods, logs or whatever you need, but the errors shall not be used to give this kind of information.

Actually, if we address few high consumption integrations, I agree with Ruben, but if we want to address a larger market with many developers... correlators and tracking would be inefficient. I think optional type should do the trick and let each provider to implement the way it wants.

rartych commented 8 months ago

It occurred that in July 2023 RFC 9457 was published and obsoleted RFC 7807.

There are pros & cons for this solution of Problem Details for HTTP APIs.

gmuratk commented 7 months ago

@eric-murray, @rartych I'm not sure where we are with the EUC feedback/input, but I added some suggestions in the discussion for #113 which mentions this issue. It was suggested to bring them over to this discussion. Instead of repeating the lengthy comment and an example implementation here is the link to the comment. https://github.com/camaraproject/Commonalities/issues/113#issuecomment-1865020506

akoshunyadi commented 7 months ago
gmuratk commented 7 months ago

Thanks @akoshunyadi

  • I don't see the necessity to introduce an additional attribute 'cause'. I thought the reason for the error contained in the 'detail' attribute already.

RFC 9457 has the following statement: "Consumers SHOULD NOT parse the "detail" member for information". Instead, having 'cause' attribute may benefit the consumer to trigger programmable action - "to resolve an error" which is the original goal outlined in the problem statement.

  • The question is how type should be handled. Since it is an URI it doesn't have to be an URL, just an unique identifier is also possible. For that I would prefer a common list for all APIs.

I think the recent changes for moving common structures to CAMARA_common.yaml aligns with this. However we need a way for API specific [error] causes to be supported as well. That's the reason for API Specific Cause value list and definitions in the example implementation of the qod-api I provided in my comment.

jlurien commented 7 months ago

The debate is split between this issue and #113, but one common point is whether API specs should normalize the main identification of an error and where, In my opinion, the equivalent for type of RFC 9457 in our current model is the code. title in the RFC is just an optional human-readable summary of the problem type that should not change between errors of same type. detail is for further details, that may change between instances (similar to our message)

Situation is that we are not using consistently the code across current CAMARA specs. Some APIs define specific codes for specific API errors, such as SIM_SWAP.UNKNOWN_PHONE_NUMBER while others rely entirely on generic errors and the same code is used always for same status, providing no extra value.

So IMO changing to RFC 7807 and renaming the object keys is not changing the situation. The real difference is to provide relevant values for the type/code, so developers can rely on known values to identify specific problems.

palmerabollo commented 7 months ago

The current format with code+message+status is inspired by some big industry players that have thousands of developers already using their APIs:

(I'm only aware of Twitter/X using a type field)

Since we are targeting the same kind of developers, I believe familiarity is a good argument in favor of keeping the current code+message fields.

I'm not against adding a field with a link to the error spec (it will be hard to maintain, though). I'd try to keep the interfaces as simple as possible until we have more input from devs using the APIs. Adding an new field in the future is easy because it's not breaking change, but removing or renaming fields will be painful.

patrice-conil commented 7 months ago

On Orange side we agree with TEF position and don't see enough value to change.

gmuratk commented 6 months ago

Thank you for the comments and provided references.

Based on the review, here are my thoughts:

References from the big industry players have 'error codes' that are unique allowing the Application Developers to be able to identify specific problems. Whereas in the current API definitions, 'code' values are overloaded and 'message', which is almost like free form text, is used as the discriminator. It will be challenging to the developers to keep a map of status+error+message especially if the messages have to be localized by API providers.

If there is a concern with addition of a new key 'cause' that will be uniquely identifying the specific issue, we can possibly rework the 'code' field. But then again, I don't think it'd be wise to have 'code' values that deviate from the corresponding HTTP Response integer/string combo. (e.g. 400 Bad Request -> Invalid Argument, 401 Unauthorized -> Unauthenticated, 403 Forbidden -> Permission Denied, 500 Internal Server Error -> Internal, 503 Service Unavailable -> Unavailable)

The example I provided by forking the CAMARA_common.yaml and qod-api.yaml offer a list of common and API specific cause values. Common ones can be listed in the CAMARA_common.yaml and can serve as a starting point for any new APIs. API Specific causes (i.e. ErrorCodes) can be listed as enumarations in each API definition.

For convenience: An example of the modified QoD API (forked from v0.10.0-rc) that is making use of an example CAMARA_common.yaml can be found in the following link: https://github.com/gmuratk/QualityOnDemand-httpresponses/blob/main/code/API_definitions/qod-api.yaml

gmuratk commented 6 months ago

128 seems to be relevant issue to the concern below

If there is a concern with addition of a new key 'cause' that will be uniquely identifying the specific issue, we can possibly rework the 'code' field. But then again, I don't think it'd be wise to have 'code' values that deviate from the corresponding HTTP Response integer/string combo. (e.g. 400 Bad Request -> Invalid Argument, 401 Unauthorized -> Unauthenticated, 403 Forbidden -> Permission Denied, 500 Internal Server Error -> Internal, 503 Service Unavailable -> Unavailable)

cc: @lbertz02

eric-murray commented 6 months ago

Replaced by #133

hdamker commented 6 months ago

"type": "https://raw.githubusercontent.com/camaraproject/QualityOnDemand/release-0.8.1/code/API_definitions/qod-api.yaml"

I know that this discussion is closed and replaced, but just the remark, that the proposed use of type URLs within the issue description wouldn't work. "type" is an identifier and using the version of the API within the URL would mean to define with each version a new problem type and therefore introducing a breaking change. To ensure stable problem types, RFC 9457 is going so far to introduce a IANA registry for problem type (with "Specification required"): https://iana.org/assignments/http-problem-types