bnoordhuis / node-iconv

node.js iconv bindings - text recoding for fun and profit!
Other
799 stars 123 forks source link

utf-8 to iso-8859-1 error no conversion #206

Closed kaiserdj closed 4 years ago

kaiserdj commented 4 years ago

I am trying to transform the encoding a text, to get the original text. But it only returns the correct text from a variable created by me, not by a variable of another function. A major, when I try to transform another text (Hakaba Kitarō) I get error: (node:5436) UnhandledPromiseRejectionWarning: Error: Illegal character sequence.

function iso_to_utf8 (text) {
    console.log(`text orginal: ${text}`); //return text orginal: "Queen\u00c2\u0092s Blade OVA 2011"
    body = new Buffer.from(text, "utf8");
    conv = new Iconv("utf8//TRANSLIT//IGNORE", "ISO-8859-1");
    body = conv.convert(body).toString();
    console.log(body); //return Queen\u00c2\u0092s Blade OVA 2011
    body = new Buffer.from("Queen\u00c2\u0092s Blade OVA 2011", "utf8");
    conv = new Iconv("utf8//TRANSLIT//IGNORE", "ISO-8859-1");
    body = conv.convert(body).toString();
    console.log(body); //return Queen’s Blade OVA 2011

    return body;
}
kaiserdj commented 4 years ago

The correct text that you would have to return: https://pastebin.com/raw/0ZehUbJX I put it in pastebin, because Github doesn't teach it

bnoordhuis commented 4 years ago

This is about the \u00c2\u0092 character sequence? That's the byte sequence \xc3\x82\xc2\x92 when converted to UTF-8.

//TRANSLIT correctly copies that to the output buffer as the ISO-8859-1 byte sequence \xc2\x92, which .toString() then interprets as UTF-8 and converts to U+0092, a control character that your terminal may or may not display.

(All characters in the range U+007F-U+009F are control characters, by the way.)

That's all expected behavior and working as intended so I'm going to go ahead and close this out.