Closed QuocNguyen799 closed 3 years ago
Encoding detection is not precise, especially given a single character. Do you have any specific question to iconv-lite here?
On Fri, Jul 30, 2021, 01:14 QuocNguyen799 @.***> wrote:
I want to encode this charater to Shift JIS: 髙 const encoded = iconv.encode('髙', "Shift_JIS") But i receive EUCJP instead of SJIS when i detect the "encoded" above const detected = Encoding.detect(encoded);
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/270, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHIHVNNCUFLC4QG666DT2IYKRANCNFSM5BHZPCZQ .
Thanks for your reply, but it's not just that the detection is not precise, the encoding is also incorrect. This character 髙 should be '3f' when convert to shift_jis.
3f
in Shift_JIS is just question mark "?". I assume it means that the script you're referring to doesn't know how to encode this character.
Also not sure where you're getting 8de8. On my machine I get bytes 0xFB 0xFC:
> iconv.encode('髙', "Shift_JIS")
<Buffer fb fc>
Checking in a recent browser that supports https://encoding.spec.whatwg.org/ (the main standard that iconv-list follows), I see that this is indeed a correct encoding:
let dec = new TextDecoder("Shift_JIS");
let buf = Uint8Array.from([0xfb, 0xfc]);
document.body.innerText = dec.decode(buf); // shows "髙"
The question mark "?" or 3f
is exactly what I need, because character 髙
belong to EUC_JP
, not Shift_JIS
.
When i try it with php, it works
$str = mb_convert_encoding('髙', "SJIS"); $str = mb_convert_encoding($str, "UTF-8", "SJIS"); var_dump($str);
I don't know much about encoding standards. Maybe there is a difference in iconv-lite and php's encoding standards.
Do you have any suggestions for this?
If not, I will close this issue.
And thank you for your time.
As far as I know, recent versions of Shift_JIS such as Shift_JIS-2004 can encode the characters that were previously only encodable with EUC_JP (see https://en.wikipedia.org/wiki/Shift_JIS#Shift_JISx0213_and_Shift_JIS-2004). I assume PHP does not support it, or is somehow more strict about using the older version of Shift_JIS?
Iconv-lite only supports the extended version of Shift_JIS. I don't think there's an easy way to restrict encoding to a strict Shift_JIS. One hack I can think of could be to replace all "unsupported" characters before encoding with an explicit "?", but that requires knowledge of all these unsupported chars.
I want to encode this charater to Shift JIS: 髙
const encoded = iconv.encode('髙', "Shift_JIS")
But i receive EUCJP instead of SJIS when i detect "encoded" aboveconst detected = Encoding.detect(encoded);
And the "encoded" that i receive is : 8de8 But it should be: 3f https://www.skandissystems.com/testCharset.pl