FooSoft / yomichan

Japanese pop-up dictionary extension for Chrome and Firefox.
https://foosoft.net/projects/yomichan
Other
1.06k stars 217 forks source link

Something weird with the deinflection of 損ねて #1716

Open archiif opened 3 years ago

archiif commented 3 years ago

Description For whatever reason, the definition of 損ねる as an auxiliary verb in JMDict does not show up when the verb is inflected. Not really sure what the broader implications are, but it's certainly annoying. Example screenshots: image image

Browser version Chrome 90.0.4430.212

Yomichan version 21.4.30.2

toasted-nutbread commented 3 years ago

The part of speech is empty for the auxiliary definition, so it won't match deinflections.

// ...
[
  "損ねる",
  "そこねる",
  "v1 vt",
  "v1", // <-- ichidan verb
  714,
  ["to harm", "to hurt", "to injure", "to wreck"],
  1406690,
  "P ichi news"
],
[
  "損ねる",
  "そこねる",
  "aux-v",
  "", // <-- no part of speech
  713,
  ["to miss one's chance to (do something)", "to fail to (do what one ought to have done)"],
  1406690,
  "P ichi news"
],
// ...
stephenmk commented 2 years ago

I think I can implement a fix to this problem in yomichan-import by simply ensuring every term from a given JMdict entry is assigned the same part-of-speech rules. 損ねる only contains one part-of-speech rule ("v1" - ichidan verb) in its first sense, so we can just assign that to the second sense as well.

But that solution could produce some unexpected behaviors. Here are some examples.

  1. 無い無い has two senses in JMdict: the first is a する verb, the second is an い-adjective. If the user scans 無い無くて, should both senses be displayed? The current behavior is to only display the second sense.
  2. 伏せる has five senses: the first four are ichidan verbs, the fifth is a godan verb. If the user scans 伏せって, should all senses be displayed? The current behavior is to only display the fifth sense. (Coincidentally, the second search result contains the other four senses).

Those two seem to be the only remarkable edge cases. There are also hundreds of nouns that can function both as する verbs and as various other parts of speech (nouns, adverbs, と-adverbs, な-adjectives, の-adjectives, etc.). For example, おしゃれ contains a な-adjective sense, a noun sense, and a する-verb sense, but if you scan おしゃれする it will display all three senses. So it doesn't look like these entries are causing scanning issues with the current version of JMdict for yomichan...

UNLESS you try scanning them with a な particle attached. In the case of 「おしゃれな」 I would expect Yomichan to only display the な-adjective sense, but it actually interprets it as an "imperative negative" する-verb form (which I would have expected to only apply to 「おしゃれするな」) and therefore only displays the する-verb sense.

If we go with the plan I proposed above (i.e., assigning every term from a given entry the same set of part-of-speech rules), then all of the senses will be displayed regardless of the form being scanned by the user. 「おしゃれな」 will still display "imperative negative" (which seems like a bug to me), but at least all of the senses will be displayed.

(Just to be clear, I'm talking about the part-of-speech rules that are used behind-the-scenes by yomichan for de-inflecting words. These are different from the part-of-speech tags that are displayed to the user. Those tags would not be affected.)

Alternative and/or complementary solutions

  1. We could be more selective about how we assign these extra grammar rules. For example, we would only assign an "ichidan verb" rule to the "auxiliary-verb" sense of 損ねる, but we would not assign a な-adjective rule to the する-verb sense of おしゃれ. This would be quite a bit more complicated.
  2. We could send change requests to JMdict every time we find a sense that does not contain the appropriate inflection part-of-speech information.
  3. Yomichan could be updated so that it doesn't require a match with an inflection part-of-speech rule in order to display a sense, but instead gives the matched sense a higher priority.

Thoughts?

Thermospore commented 2 years ago

both show up for me (but I forget why...) image

settings: yomichan-settings-2022-04-24-20-03-26.txt

stephenmk commented 2 years ago

I'm not able to reproduce that using your settings and a stock version of JMdict for yomichan. I tried both the JMdict (English) file from the yomichan downloads page, and I also built a new copy using the main git branch of the yomichan-import repository. I think you're probably using a copy of JMdict that was produced by a modified version of yomichan-import.

sokonete

Thermospore commented 2 years ago

I suspect it's simply due to our jmdicts having a different name, so the settings aren't applying (I chopped the " (English)" off mine to save space)

Here is my copy for ref: jmdict_english_2022-01-16.zip

stephenmk commented 2 years ago

Okay, I see. So if you have "Group related terms" enabled and the "Primary dictionary" option set to JMdict, then the senses will all appear regardless of inflection.

"Group term-reading pairs" sokonete1

"Group related terms" sokonete2

"Group related terms" with "Primary dictionary" selected to "JMdict (English)" sokonete3

stephenmk commented 2 years ago

Earlier I wrote:

UNLESS you try scanning them with a な particle attached. In the case of 「おしゃれな」 I would expect Yomichan to only display the な-adjective sense, but it actually interprets it as an "imperative negative" する-verb form (which I would have expected to only apply to 「おしゃれするな」) and therefore only displays the する-verb sense.

「おしゃれな」 will still display "imperative negative" (which seems like a bug to me), but at least all of the senses will be displayed.

I figured out the real problem here. Yomichan's de-inflection process is fine. However, the "vs" inflection rule in Yomichan should only be applied to terms which have "する" included (like "値する") rather than nouns which may optionally attach to する (like "おしゃれ" or "掃除"). Counter-intuitively, this means that senses tagged with the "vs" part-of-speech code in JMdict should not get the "vs" grammar rule in the Yomichan dictionary. Instead, the "vs" Yomichan rule should only be applied to part-of-speech codes "vs-c", "vs-i", and "vs-s", which are used on expressions which actually end with する or す.

https://github.com/FooSoft/yomichan-import/blob/master/edict.go#L20

    } else if strings.HasPrefix(tag, "vs") {
        term.addRules("vs")
    }

A dash should be added to the HasPrefix check here:

    } else if strings.HasPrefix(tag, "vs-") {
        term.addRules("vs")
    }
stephenmk commented 2 years ago

Also, with regard to the original issue, I think the right approach is to submit corrections to JMdict when these bugs are found. I've already fixed 損ねる. I think this issue can be closed.

toasted-nutbread commented 2 years ago

@stephenmk Made a PR based on your feedback: https://github.com/FooSoft/yomichan-import/pull/37. I didn't really test it since it seems to make sense based on your comments, but let me know if you see any issue.

stephenmk commented 2 years ago

@toasted-nutbread I made an identical change in my fork of yomichan-import (which I need to get around to rebasing) a few weeks ago, and I have been using the dictionary file produced by it. Everything has been working as expected, so I think you're good to go.