Closed TheBoringDev closed 2 years ago
I have discussed with professor Google and ZWNBSP is Zero Width Space Unicode character. It seems to this character is always added at the beginning of the encoded Unicode string (?)
all utf 16 files may start with 0xFEFF or 0XFFFE [-2, -1] or [-1,-2] This is what your editor is showing (as the unused buff bytes at the end as 'NUL') You may remove the two initial bytes and the nulls at the end in this concrete case.
You can also use
org.mozilla.universalchardet.ReaderFactory.createBufferedReader(file)
It will detect the charset and create a reader that skip BOM bytes if present in UTF files
First, I have a JSON file and its encoding is UTF-16BE as shown below.
Next, I use the sample code to read that JSON file content as an array byte to detect its encoding and convert the array bytes back to the original string based on that detected encoding.
But, the output string is added a weird symbol "ZWNBSP" and I have no idea what it is. I am expecting it should not be there.
I can replicate this issue with UTF 16BE and UTF 16-LE, but cannot replicate it with ANSI and UTF8.