Closed The-Aniliner-Gerstung closed 3 years ago
Hi @The-Aniliner-Gerstung,
Thanks, I'll investigate. We use Python's json
module so it's not clear to me yet why there is an encoding problem
Thank you for the very quick response, @roll!
That has probably something to do with the NamedTemporaryFile
opening? You can specify the encoding there (line 115, metadata.py):
The second example from the original post was created using the usual
with open("file.json", encoding="UTF-8") as f:
json.dump(artifact.to_dict(), f, indent=2, ensure_ascii=False)
which creates valid UTF-8 (although the special characters are not escaped, which should be done following the JSON specification)
Great! Thanks for the quick analysis.
I'm going to fix it this week. If you're interested feel free to PR adding a (failing -> fixed) test
That sounds great, thank you very much! I just tried contributing to your repository but unfortunately my company development environment won't let me install the dependencies.
I have prepared a test file for encoding for you aswell as a valid UTF-8 encoded datapackage (needed for the second test):
Hello @roll ,
thank you for your very quick fix. I just downloaded the newest version 3.34.2 and tested the encoding. Files are now written correctly but I noticed, that reading a new UTF-8 JSON leads to problems. I guess the with open()
function should also have encoding="utf8"
as parameter.
I attached you a little script I ran before and after updating the framework (see the version
inside the JSON). After the update the read()
function fails.
Thanks!
Sorry for these errors it's really hard to catch them as it depends on the user's locale so our CI doesn't catch them. I'm releasing a fix
Thanks for fixing this issue - works like a charm 👍 And BTW: your release rate is impressing - keep going!
hey guys I have the same issue with German umlauts but I did not really underestand the solution, could somone please explain ?
Hi @farhadmaleki85 can you please create a new issue with your problem description?
@farhadmaleki85 this worked for me, because i didn't need the string, but the direct output into a json file https://stackoverflow.com/questions/18337407/saving-utf-8-texts-with-json-dumps-as-utf-8-not-as-a-u-escape-sequence
Overview
Hello,
I'm trying to save datapackages, resources and table schemas as JSON using the built in
.to_json()
function. Problem is, that I have german umlauts (Ä,Ö,Ü,ß) and exponents (e.g. m², m³) in my meta data.When opening the resulting JSON file in PyCharm, it tries to open that with UTF-8 encoding. This results in unrecognised characters and a warning from PyCharm, because the file seems to be encoded in ISO 8859-1 (see screenshot):
That's how the file should look like:
Other editors (like Windows Notepad or Notepad++) recognise the encoding correctly.
My question is, when is fricitonless using UTF-8 and when other encodings? Why is it not saving in UTF-8 at all times and escaping unicode characters, since the JSON specification (RFC 7159, Chapter 8.1) specifies UTF-8 as standard encoding?
Thanks in advance and keep up the good work!
Python-Code to reproduce this issue:
Please preserve this line to notify @roll (lead of this repository)