balena-io-modules / drivelist

List all connected drives in your computer, in all major operating systems
Apache License 2.0
242 stars 90 forks source link

Fix lsblk JSON output utf-8 encoded characters #320

Closed thundron closed 5 years ago

thundron commented 5 years ago

Change-type: patch Signed-off-by: Lorenzo Alberto Maria Ambrosi lorenzothunder.ambrosi@gmail.com

lurch commented 5 years ago

Hmmm, won't this just mean that "mountpoint": "/media/Donn\xc3\xa9es" gets changed to "mountpoint": "/media/Donn/xc3/xa9es" ? Wouldn't it be better for "mountpoint": "/media/Donn\xc3\xa9es" to get changed to "mountpoint": "/media/Données" ? (or however NodeJS handles UTF8)

thundron commented 5 years ago

@lurch You're right, I was trying with another JSON but it's actually the output from lsblk itself: https://github.com/karelzak/util-linux/issues/330 The only thing we can do is try to fallback to --pairs --all also after a JSON parse error

lurch commented 5 years ago

@thundron Oooh, interesting. But it looks like the code in the PR linked to the issue you found ( https://github.com/karelzak/util-linux/pull/331/files ) only deals with control characters. Looking at the JSON spec at http://www.json.org/ it seems that \xc3\xa9 ought to be converted to \u00e9 so I guess lsblk's JSON output is a bit naive and not fully unicode-aware :-/ (Up to you if you want to create an issue on their tracker)

Given that \x isn't valid in JSON, and https://github.com/karelzak/util-linux/pull/331 seems to fix all single-character \xYY sequences, I guess you could theoretically convert all \xYY\xYY utf8 pairs into their (JSON-compliant) \uYYYY unicode equivalent?

lurch commented 5 years ago

Looks like your current version of this PR would convert \xc3\xa9 to \u00c3\u00a9 which isn't what you want ;-)