adsabs / adsws

ADS web services
Other
2 stars 15 forks source link

Character set not defined in some responses #96

Closed jonnybazookatone closed 8 years ago

jonnybazookatone commented 8 years ago

This is causing problems for users of the Python API client.

For the original ticket, see: andycasey/ads/pull/52

@takerukoushirou:

Sure, that particular case for example was the paper returned by query identifier:2012Natur.486..502B for the fields 'bibcode', 'title', 'author', 'bibstem', 'pubdate', 'property', 'pub'.

The server returns the content-type application/json, and the Python requests package only checks for a charset definition in the content-type or defaults to ISO-8859-1 for text content in the requests utility function get_encoding_from_headers (i.e. in our case this yields none). Once the text property of the response object is queried, and as no encoding was set, the apparent (=guessed via chardet) encoding property is used to transform the content before an attempt is made to decode the JSON data.

The simplest solution is to return a content-type of application/json; charset=utf-8.

jonnybazookatone commented 8 years ago

After some consideration, I think the client should be modified. See the PR on the client.

romanchyla commented 8 years ago

I agree that the client should be fixed but I'm reopening this issue because universally we should return utf-8 header. If we do not, then it is also our problem.

jonnybazookatone commented 8 years ago

I can understand your reasoning, but I don't think this issue is appropriate. I opened a new one here:

99

This ticket was directly related to JSON problems, and so the linking to other repos will give the wrong idea that this is still an issue. However, in terms of JSON, I think we're fine in terms of being RFC compliant.