DeepLcom / deepl-api-issues

Issue tracking repository for the DeepL API.
MIT License
0 stars 0 forks source link

Latin1 encoding instead of UTF-8 returned via DeepL API (Dart/Flutter) #9

Closed vishna closed 1 year ago

vishna commented 1 year ago

POST https://api-free.deepl.com/v2/translate returns ISO-8859-1 instead of UTF-8

e.g. translating Ukrainian to Polish:

IS:
Привіт! --> Cześć!

SHOULD BE:
Привіт! --> Cześć!

Workaround (in Dart):

final responseBodyISO88591 = response.body;
final responseBody = utf8.decode(latin1.encode(responseBodyISO88591));
JanEbbing commented 1 year ago

Thanks for the report! I can't reproduce this right now, are you using any other options (eg tag handling or formality)?

$ curl -X POST 'https://api-free.deepl.com/v2/translate' \
--header 'Authorization: DeepL-Auth-Key MyFreeKey' \
--header 'Content-Type: application/json' \
--data '{"text": ["Привіт!"], "target_lang": "PL", "source_lang":"UK"}' >> output_pl_curl.txt

$ xxd output_pl_curl.txt
00000000: 7b22 7472 616e 736c 6174 696f 6e73 223a  {"translations":
00000010: 5b7b 2264 6574 6563 7465 645f 736f 7572  [{"detected_sour
00000020: 6365 5f6c 616e 6775 6167 6522 3a22 554b  ce_language":"UK
00000030: 222c 2274 6578 7422 3a22 437a 65c5 9bc4  ","text":"Cze...
00000040: 8721 227d 5d7d                           .!"}]}

Entering {"translations":[{"detected_source_language":"UK","text":"Cześć!"}]} on a site like https://mothereff.in/utf-8 gives me the same expected byte sequence as xxd

vishna commented 1 year ago

Works ok with curl, I tried ruby script, works correct too. This lead me to believing there's something different with the dart http package.

https://stackoverflow.com/questions/61312620/flutter-http-response-body-bad-utf8-encoding

HTTP in absence of a defined charset is assumed to be encoded in ISO-8859-1 (Latin-1). And body from its description is consistent with this behaviour. If the server response sets the Content-Type header to application/json; charset=utf-8 the body should work as expected.

so the workaround for dart is to use:

final responseBody = utf8.decode(response.bodyBytes);

I guess it would just work if API was returning application/json; charset=utf-8 in headers instead of just application/json 🤷

JanEbbing commented 1 year ago

Ah, I see. Please note that the dart library is not officially supported by DeepL, so we cannot provide support for using it. I will look into the possibility of explicitly adding charset in the headers and get back to you, thanks!

vishna commented 1 year ago

I've just used API directly - I guess the dart library you mentioned solved this exact issue a while ago: https://github.com/komape/deepl_dart/issues/14

Anyway, thank you for your time and feel free to close this issue.

JanEbbing commented 1 year ago

I think we are within the specification here, it seems Dart/Flutter is a bit unergonomic with the defaults regarding this, requiring you to add this line - the relevant RFC states the default encoding for json is UTF-8

See the discussion on https://github.com/dart-lang/http/issues/367

JanEbbing commented 1 year ago

I just checked internally, and we are in the process of explicitly returning a charset here nonetheless.