Closed stephenmk closed 2 years ago
I think the best thing is to remove the [yoji] tag from all 1138. I sampled a dozen or so, and as expected the tags were added because the terms were in Kanji Haitani's yoji list. That list turned out to be rather, er, flawed. I can do the tag removal as a bulk-edit process, so I'll add the task to my "get a roundtuit" list.
Thanks for generating the list, Stephen. Good thinking to use the jitenon site.
I think the best thing is to remove the [yoji] tag from all 1138.
I agree.
I was able to find a few that are included in other yojijukugo dictionaries or online lists (e.g. 才気縦横, 翻然大悟, 文武不岐) but it's clear that the vast majority should not have the tag.
Yes, many thanks to Stephen for that list. I have run the update now (removing from the list the 3 Robin mentioned.) I'll close the issue for now.
I've put together a list of JMdict entries which contain the [yoji] miscellaneous tag on one or more senses but do not contain any surface forms that can be found in jitenon's yoji dictionary.
The list: https://gist.github.com/stephenmk/0dcb1318ec60bb35045e75b062d74be4
Goo Jisho also hosts content from gakken and shinmeikai-branded yoji dictionaries. I have some data that was scraped from these online dictionaries about a year ago, and I was also unable to find any of the above surface forms in either of these datasets. I can't say with certainty that these datasets are comprehensive, however.
The list is unfortunately pretty long. At least 1138 of 2730 of our yoji entries do not contain any surface forms that may be found in the jitenon dictionary. I'm not sure we'd want to remove the yoji tag from all of them, but there are far too many to review individually. So there may not be much we can do with this information.
There are six entries in the list which have priority tags, so maybe we should at least consider removing those yoji tags: