Open batyshkaLenin opened 1 year ago
Hmm I see the letter я
at 0xDF, could it be intentional?
Also ¤
is the "current currency" symbol AFAIK, so I think it should be converted to Euro as expected. Let me know if it's a wrong assumption.
The problem is that the decoding is going wrong. If you write a maccyrillic decoding test, instead of the letter я
you get an ¤
. The letter я
is not a symbol of ¤
. You are correct, it is a currency symbol.
Note that iconv-lite here uses generated data from the low-level iconv
library, which is an informal standard for character encoding conversion, so I tend to trust it unless there's compelling data that it's wrong.
Wait, what do you expect the code for this letter be - 0xFF or 0xDF?
The code for this letter should be 0xDF, but when decoding it translates as 0xFF. I don't know how to prove that this is true, except that I enter the letter я
in Numbers on MacOS, and after decoding it turns into ¤
, even though it should remain я
.
As a test, you can write a test for this encoding, as well as other Cyrillic encodings.
Well, if you can debug print the Buffer that is sent to the decode() method, we can check which byte corresponds to я
there and potentially add a test. Iconv-lite is pretty thoroughly tested already, but it uses either iconv library or WHAT-WG as the "ground truth". These sources might be wrong but it's pretty rare.
\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xdf
must be equivalent to АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя
, but it's not. Or am I misunderstanding something?
Just checked it and looks correct:
$ node
> iconv = require("iconv-lite")
> iconv.decode(Buffer("\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xdf", "binary"), "maccyrillic")
'АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя'
Where are you getting the wrong results?
In this encoding after the character![изображение](https://user-images.githubusercontent.com/8469236/184115634-62694985-a1a0-47d1-8db6-33f635d56d9b.png)
ю
there is a symbol¤
. Because of this, in places where there should have been the letter "я" is decoded symbol "€" (last symbol).