Justify use of charset parameter in JSON payloads

toolness commented 8 years ago

I noticed that the Use UTF-8 section claims that "An API that returns JSON should use":

Content-Type: application/json; charset=utf-8

It seems this came out of https://github.com/WhiteHouse/api-standards/issues/22, but there wasn't much discussion on the topic.

While I fully support the idea of using UTF-8, it turns out that using charset parameters on JSON payloads is actually potentially problematic. The best explanation I've seen about this is a blog post from Armin Ronacher, the creator of Flask, who asserts that the JSON mime type intentionally does not specify a charset parameter, and that adding one introduces even more complexity into an already-complex situation.

Interestingly, some REST API tools side with Ronacher's interpretation of the spec, such as Django REST Framework, which actively makes it very difficult to actually include this charset parameter in a mime type.

This situation is complex, and I don't know what the solution is, but I do think that it at least merits some discussion, and the ultimate decision should be justified in some way. My personal solution has been to take advantage of the JSON specification's \u escape sequence and simply deliver all JSON content as ASCII, which avoids the debate altogether while still allowing unicode to be transmitted. But this can also increase payload sizes if they contain lots of non-ASCII characters.

konklone commented 8 years ago

It seems this came out of WhiteHouse/api-standards#22, but there wasn't much discussion on the topic.

I was the author of that piece of our document, and it came from my experience building the Sunlight Congress API. Being able to easily and correctly view JSON in-browser was a priority for that API, and non-ASCII characters would render incorrectly in-browser without the charset parameter.

I respect Armin's opinion and keeping complexity low is always a good goal, but unless there are actual interoperability problems with a utf-8 charset, it doesn't outweigh the practical benefit to me.

toolness commented 8 years ago

Cool, that seems reasonable to me!

I believe that Armin's main criticism of the charset parameter is that it can be used to encode JSON into charsets that it wasn't originally intended to be encoded into, like latin1, which basically means that clients that do purely follow the spec (and therefore which don't look at the charset parameter) would get confused.

However, because your recommendation specifically recommends passing charset=UTF-8 and UTF-8 is an encoding that spec-purist clients expect, I don't believe it will result in any interoperability problems.

Anyhow, thanks for the explanation, it makes a lot of sense. I'm going to close the issue now!

18F / api-standards

Justify use of charset parameter in JSON payloads #68