belgif / rest-guide

REST Guidelines of Belgian government institutions
https://www.belgif.be/specification/rest/api-guide/
Apache License 2.0
26 stars 4 forks source link

consistent use of charset in all JSON media types #75

Closed pvdbosch closed 3 years ago

pvdbosch commented 3 years ago

We've got a rule to "Always stick to UTF-8 encoding and specify the charset in the Content-Type HTTP header" but the charset isn't consistently added throughout the guide for all JSON media (sub)types.

We've got two choices: omit the charset everywhere or add it everywhere.

In this rule it is omitted:

OpenAPI 2.0 specifications SHOULD specify following default media types:

consumes:
- application/json

produces:
- application/json
- application/problem+json

And some examples omit the charset as well.

The JSON standard mandates UTF-8 - https://tools.ietf.org/html/rfc8259#section-8

   JSON text exchanged between systems that are not part of a closed
   ecosystem MUST be encoded using UTF-8 [RFC3629].
...
   Note:  No "charset" parameter is defined for this registration.
      Adding one really has no effect on compliant recipients.

Other thoughts:

Has anyone some other experiences?

Further references:

jpraet commented 3 years ago

On JBoss EAP 7.3 we observe that the server returns Content-Type: application/problem+json by default for a method annotated (openapi-generated code) with

@Produces({ "application/json;charset=UTF-8", "application/problem+json" })

when we don't explicitly specify a media type for our JAX-RS response, for request with Accept: */*.

If instead we configure

@Produces({ "application/json;charset=UTF-8", "application/problem+json;charset=UTF-8" })

or

@Produces({ "application/json", "application/problem+json" })

then it DOES return respectively application/json;charset=UTF-8 and application/json by default.

So the presence of charset seems to unexpectedly influence the default media type returned by the JAX-RS service.

I am trying to understand if that is a bug in RESTEasy, or if it is conform with the spec. But reading this section in the spec gives me a headache :fire: 🙉 :fire: https://jakarta.ee/specifications/restful-ws/3.0/jakarta-restful-ws-spec-3.0.html#determine_response_type

pvdbosch commented 3 years ago

I think according to the JEE spec, it's still up to the JAX-RS implementation in this case: both media types seem equally specific according to Step 7 (Sort M in descending order, with a primary key of specificity (n/m>n/>/*)) and q/qs-values aren't used, so they have an equal priority. I read somewhere that Jersy implementation would pick the first media type in the annotation in this case, but don't know about RestEasy. If the media type is explicitly set on the JAX-RS Response object, it would have priority over the one in the Produces annotation.

Anyway, the REST guide should still be more consistent independent of the JAX-RS issue.

From what I find in APIs I encountered:

pvdbosch commented 3 years ago

If no significant real problems are known w/o the charset, I'd propose to drop it from the guide because:

pvdbosch commented 3 years ago

@wsalembi encountered some issues with a client (an old weblogic?) that an ISO-encoding was used if the returned Content-Type didn't have the charset. Not sure how much of an issue this still is on more recent software. Smals doesn't specify the charset in the OpenAPI files on the other hand because this caused others problems for server software when matching with the "Accept" requested media type. Instead, their server applications add the charset to the default "Content-Type: application/json" generated by the middleware. This is done using a JAX-RS response filter.

This workaround will however be difficult to fully explain and put as a requirement (MUST) in the REST guide. So we'll mention that:

pvdbosch commented 3 years ago

PR #77 ready

pvdbosch commented 3 years ago

PR merged and will be published in next REST guide update