FooSoft / yomichan

Japanese pop-up dictionary extension for Chrome and Firefox.
https://foosoft.net/projects/yomichan
Other
1.04k stars 203 forks source link

New dictionary of orthographic variants for Yomichan #2183

Open stephenmk opened 2 years ago

stephenmk commented 2 years ago

(This is more of an announcement and a request for feedback than an issue with Yomichan.)

I have developed a new version of the 漢字遣い参考 dictionary for Yomichan. The purpose of this dictionary is to display orthographic variants (as provided by JMdict) to words that you come across. This is useful for when you search for a word like 思うツボ, but your favorite dictionaries only contain entries for 思う壺.

This new version organizes expressions with one or more kanji forms and two or more readings into a table. Here's an example:

昼食 in the original version ![chuushoku_old](https://user-images.githubusercontent.com/8003332/174918525-c9965f98-06ff-477c-83fd-6fbe68dc6356.png)
昼食 in the new version ![chuushoku](https://user-images.githubusercontent.com/8003332/174917635-6d9ef302-f526-4de8-90dd-9f10602370c6.png)

The tabular format makes it easy to see which readings are associated with each kanji form. The ● symbols indicate that a combination is valid, while invalid combinations are left blank.

This new version also includes various forms of metadata (as specified by JMdict). The following symbols are used:

Symbol Meaning
Common
Outdated
Irregular
🠋 Rarely-used kanji form

For example, 涙 (なみだ) has two additional kanji forms that are rarely-used and three additional readings that are outdated.

Example table for 涙 ![namida](https://user-images.githubusercontent.com/8003332/174919579-a58cc86b-8e17-4de5-acbe-ce43ec6f6f9d.png)

Compare to how this data is presented on jisho.org

土竜 in tabular format ![mogura](https://user-images.githubusercontent.com/8003332/174920167-90d3002b-93e9-4ec1-9524-9372d2ed78ad.png)
土竜 on jisho.org ![mogura_jisho](https://user-images.githubusercontent.com/8003332/174920151-4234ad81-dc7c-45ed-893b-dd7fdf414517.png)
鷦鷯 in tabular format ![misosazai](https://user-images.githubusercontent.com/8003332/174920315-7f820b66-b07f-4d56-9d25-c52eb9ecfaf1.png)
鷦鷯 on jisho.org ![misosazai_jisho](https://user-images.githubusercontent.com/8003332/174920321-39b20dee-b834-4afb-bbd3-04bb1a1aa5f7.png)

If an entry contains no kanji or fewer than two distinct readings, the information is displayed in a basic unordered list rather than a table.

Example: 味方 ![mikata](https://user-images.githubusercontent.com/8003332/174920733-7afcbb53-de72-4ef2-bea5-3a89ed8f7154.png)
Example: 保証 ![hoshou](https://user-images.githubusercontent.com/8003332/174920789-a1a71330-fbcf-489b-ab87-e81c154b3ced.png)

JMdict's entry for 馬鹿貝 technically contains three readings (ばかがい, バカがい, and バカガイ), but I consider this to be one distinct reading. The entry is therefore displayed in a list.

Example: 馬鹿貝 ![bakagai](https://user-images.githubusercontent.com/8003332/174921095-93e46fc9-7e37-42dd-811e-4fe31ddeb1bf.png)

Installation

This dictionary includes many HTML tables, which means that the validation procedure during installation is slow. This has been documented in issue #2138.

Schema validation of data is sub-optimal and can cause significant slowdowns for large and complex dictionary files.

Due to this issue, you can expect the installation of this dictionary to take at least 30 minutes to complete. (Update 2022/07/19: the new version posted in the comment below only takes around 15 minutes.)

With that said, here is the dictionary file (Update 2022/07/19: Use the new version posted below). Please let me know if you have any suggestions on how to improve it. For example, I think the tables might look nicer if the inner cells were centered horizontally, but Yomichan does not currently support alignment in table cells. What do you think?

If you'd like to see the code used to generate it, you can find it here.

stephenmk commented 2 years ago

New version of the dictionary file with today's JMdict data (2022/07/19). I changed the dictionary name to "JMdict Surface Forms" to better describe its contents, but you can rename it to whatever you want by editing the index.json file within the zip archive.

I've simplified the table structure by removing the unnecessary \<thead> and \<tbody> sections. This means that the dictionary validates twice as fast now (around 13 minutes on my PC).

The readings in the leftmost column of the table are now styled as header cells.

昼食 in the previous version ![chuushoku](https://user-images.githubusercontent.com/8003332/174917635-6d9ef302-f526-4de8-90dd-9f10602370c6.png)
昼食 table in the new version ![unaligned](https://user-images.githubusercontent.com/8003332/179869272-fb67cfe2-8726-4808-aea9-519ba5090b1b.png)

If you want to horizontally align the table contents, you can add some custom CSS to your Yomichan settings.

昼食 table with horizontally aligned conents ![aligned](https://user-images.githubusercontent.com/8003332/179869261-bbbab7e4-7b39-4048-89ae-81db0856bf79.png)
[data-sc-ortho="table"] td {
  text-align: center;
}