FooSoft / zero-epwing

Sane data exporter for an insane dictionary format.
https://foosoft.net/projects/zero-epwing/
MIT License
99 stars 17 forks source link

how to convert {{w_xxxxx}} and {{n_xxxxx}} to unicode #6

Open deathrush opened 5 years ago

deathrush commented 5 years ago

According to 外字Unicodeマップ http://ebstudio.info/manual/EBWin4_man/0_4_5.html map file content looks like hA121 u00E0,there is no 'w' or 'n'

FooSoft commented 5 years ago

Those are indices into the character map for the given dictionary. Yomichan-Import has code to parse these entries, you can check it out here: https://github.com/FooSoft/yomichan-import/blob/master/epwing.go#L172

Character tables have to be created for every EPWING dictionary, since certain 外字 have glyphs that would normally be rendered inside the text.

epistularum commented 4 years ago

Character tables have to be created for every EPWING dictionary

Is that what you mean by a character table?

zA577   u95BD       #   閽
zA578   u8772       #   蝲
zA579   u6A1D       #   樝
zA57B   u95AB       #   閫
zA57C   u95D0       #   闐
zA57D   u9F97       #   龗
zA57E   u5B7D       #   孽
zA621   u97DB       #   韛
zA622   u65F0       #   旰
zA623   u74EB       #   瓫

Because if that's the case, installing EBWin4 and browsing to C:\Users\username\AppData\Roaming\EBWin4\GAIJI gives you a lot of tables. There's a table for kojien, wadai, meikyou, daijirin,...

playHing commented 4 years ago

@FooSoft Noticed that OCR of 外字 for several main dictionaries are done in yomichan-import. Would you mind to kindly suggest or share how the OCR can be done in batch? I would like to contribute to the repo but get stuck in the OCR part...

issue-ocr

Thanks in advance!