Closed stephenmk closed 2 years ago
I'll look into this when I get a chance, but it will be several weeks off.
I believe I've now updated the vast majority (if not all) of entries that were missed. (Or at least the ones that were missed due to this particular error).
It seems the affected entries were skipped because the corresponding forms in the meikyo EPWING contain kanji that are encoded with ad-hoc bitmap images rather than regular EUC-JP or UTF fonts.
Codepoints not fonts. Yes, EPWING was rather limited. I won't pursue it any further and the issue can be closed.
I think some entries (like 結婚) probably didn't get picked up because, despite only containing one sense in English, they contain multiple senses with glosses from different languages. In any case, I think the edits that I submitted today should cover all the entries that were missed.
EDIT: Actually, I missed some. I'll open a new issue.
Yesterday I found that 跋扈 (1573260) had not been updated with a [vi] tag by last year's automated "Meikyo vt and vi additions" process as we would have expected. I found that another word with the same kanji, 扈従 (2104700), had also not been updated.
Marcus wrote:
Robin replied:
I went looking for some counter-examples and found these three entries. They were all correctly updated by the script. Doesn't seem like the ▼ marks by themselves are necessarily the culprit.
1166970 ひとめ‐ぼれ【一目▼惚れ】 1563300 はい‐よう【▼佩用】 1563670 ふ‐かん【▼俯▼瞰】
This issue may be worth investigating. My guess is that this was caused by an error in converting 扈 from it's EPWING encoding (SHIFT-JIS, I think?) into unicode encoding.