FooSoft / yomichan

Japanese pop-up dictionary extension for Chrome and Firefox.
https://foosoft.net/projects/yomichan
Other
1.06k stars 213 forks source link

Yomichan does not keep PoS order of jmdict entries #2058

Open Marcusjmdict opened 2 years ago

Marcusjmdict commented 2 years ago

The PoS order in JMdict entries isn't random, nor does it follow a specific predetermined pattern where on PoS will always come before another one - the PoS are applied on a case-by-case basis depending on how frequently used they are for a specific word, with some other considerations. The absolute majority of JMdict entries are glossed as their first PoS - an entry tagged as [adj-no,n] will be glossed as an adjective, while an entry tagged [n,adj-no] will be glossed as a noun. うすしお for example is tagged as [adj-no,n] in jmdictdb, and glossed as "lightly salted", i.e. as an adjective. お好み is tagged as [n,adj-no] and glosses as "choice", i.e. a noun. However, in Yomichan, the PoS order is displayed as [adj-no,n] for both these entries, which actually implies that "choice" should be interpreted as an adjective rather than a noun - which of course then ends up meaning "of very good quality", which isn't the actual meaning of the word at all.

stephenmk commented 2 years ago

The tags are in the correct order in the dictionary files before they are imported into yomichan. The order in which the tags are displayed is determined on a global basis as configured in the tag_bank_1.json file. These part-of-speech tags are all configured identically (the "order" value is set to -3). It seems that when yomichan collects tags with equivalent "order" values, it sorts them alphabetically at some point.

Here's a few potential solutions I can think of:

  1. Update yomichan so that tags with equal "order" values are not alphabetized.
  2. Update the tag format into a display-value and key-value format. Currently, a tag's name is used as both the text displayed to the user and as a key identifier for assigning configuration options behind the scenes. The problem would be solved if we could have multiple tags that display as "adj-no" but have different "order" values assigned to them.
  3. Merge all the parts of speech into a single tag during the dictionary creation process (yomichan-import). This is kind of clumsy but it would be easy to do.