Open kshetline opened 5 years ago
I've looked into Shift_JIS more, and it's left me uncertain about how best to handle these conflicting characters.
While there's plenty of clear information about how ¥
and ‾
takeover for \
and ~
, I haven't been able to find any clear statement about whether \
and ~
simply don't exist in Shift_JIS, or if there are alternate (probably multi-byte) encodings to handle these two displaced ASCII characters.
When I try to encode \
or ~
using node-iconv it throws an error.
Your iconv-lite encodes both ¥
and \
as 0x5C, and both ‾
and ~
as 0x7E.
Perhaps that is the best thing to do on the encoding side if there aren't proper encodings for \
and ~
(as users of Shift_JIS are apparently accustomed to these particular confusions, and these substitutions might provide more info to the user than treating \
and ~
as unknown characters), but in that case the decoding side should favor ¥
over \
, and ‾
over ~
if \
and ~
don't have their own unique encodings.
I see there's an earlier (much ealier, 2014!) bug about these two characters not being encoded correctly for Shift_JIS. Now it seems there's a problem with them being decoded correctly in EUC-JP, Shift_JIS, and Big5.
I changed
shiftjis-test.js
to add these two characters to the test:...and the second assert fails. The Yen sign and overline get decoded as if they were ASCII, as backslash and tilde. A similar failure occurs in
big5-test.js
when I add¥
and‾
totestString
.I wasn't sure where EUC-JP was tested specifically.