Open SpongebobSquamirez opened 5 years ago
So literally all the EPWING dictionaries I've seen before were encoded in EUC-JP, but it looks like you've found one that isn't! This is probably the problem here: https://github.com/FooSoft/zero-epwing/blob/master/convert.c#L119
It should probably dynamically pick an encoding based on the charCode
value...
I looked on Wikipedia and also here: https://stackoverflow.com/questions/1778619/encoding-conversion-from-jis-x-208-to-unicode but I'm not sure what's the best thing to use. I'm not sure #1 what the right encoding is, and #2 what the character set for it is written as in C. If it's like SHIFT_JIS/SHIFT-JIS, I could try changing that, but I don't really want to build this thing (by the way, I'm on Windows).
Also, all my dictionaries give the same type of result. I tried 大辞林, 新明解, 大辞泉, 新辞林, 明鏡, etc.
I'm not sure what format this is being outputted to, but when I view it in notepad++ it says it's UCS 2 LE BOM. When I convert it to UTF-8 nothing changes, and when I use SHIFT-JIS it gives Japanese characters but it's all nonsense.
I've attached an excerpt of my output, as well as another file showing the original entries from that excerpt for two headwords I was able to track down.
Excerpt:
Original Entries: