Closed AcckiyGerman closed 6 years ago
@AcckiyGerman
"title": "мамы"
and "title": "\u043c\u0430\u043c\u044b"
are equivalent, both decode to exactly the same thing and are both valid under the JSON standard (as ASCII is a subset of UTF-8).
The latter is more compatible with transports which prefer ASCII only (e.g. emails...). Why is this important?
@akariv
Yes, you are right, thanks.
I've tested loading such a descriptor and package.descriptor
display the title correctly in the terminal.
So it is important only if humans are reading the raw datapackage.json
Wontfix to leave compatibility with transports which prefer ASCII.
the problem
"title": "мамы"
becomes"title": "\u043c\u0430\u043c\u044b"
afterdump.to_file
the reason
Python's
json.dump[s]
save json file inascii
encoding by default: https://docs.python.org/3/library/json.html#basic-usageSo does the DPP - saves descriptors in the
ASCII
despite the file encoding isutf-8
: https://github.com/frictionlessdata/datapackage-py/blob/0bc5276c2acd730fd765af99dbfdecbca9b1c46d/datapackage/package.py#L249 https://github.com/frictionlessdata/datapackage-py/blob/0bc5276c2acd730fd765af99dbfdecbca9b1c46d/datapackage/package.py#L269how it should be:
Here is a standard of JSON: https://tools.ietf.org/html/rfc7159#section-8.1
@akariv @roll Do you have any objections if I fix it ?