FooSoft / yomichan

Japanese pop-up dictionary extension for Chrome and Firefox.
https://foosoft.net/projects/yomichan
Other
1.04k stars 200 forks source link

Deinflect できる to する #2266

Open tomtung opened 1 year ago

tomtung commented 1 year ago

「できる」 should be de-inflected as the potential form of 「する」。

Otherwise, for example, Yomichan correctly matches 「全うする」 to the meaning of "to accomplish / to fulfill / to carry out", but incorrectly matches 「全うできる」 to 「真っ当・全う・真当」 with meaning "proper / respectable / decent / honest".

toasted-nutbread commented 1 year ago

I think I had considered adding this at some point, but didn't for whatever reason, maybe because I thought it might have false positives or something. One thing that comes to mind is what scanning できる in isolation would result in する being the first result, but testing your branch, there doesn't seem to be any issue.

If this is added, there are a few things we'd probably want:

tomtung commented 1 year ago

I think I had considered adding this at some point, but didn't for whatever reason, maybe because I thought it might have false positives or something.

Yeah totally understand. I also thought that leaving out the de-inflection of できる to する doesn't matter, until I encountered the relatively uncommon cases where attaching 「する」 somewhat changes the meaning of a word. 「全う」 vs 「全うする」 is one such example as mentioned above; 「糊」 vs 「糊する」 (as in 「口を糊する」) is another. Without de-inflection of できる to する, we wouldn't be able to match 「口を糊できる」 to 「口を糊する」.

Rebasing your branch since it's out of date.

Can you clarify what you are referring to here? I don't see any merge conflicts, and the unit tests are passing. I think when you choose to accept a pull request, you can choose to rebase instead of merge if that's what you mean, but I don't have access to that.

出来る probably also needs to be handled.

Done. Although, coming back to the concern over potential false positives, if this turns out to be too noisy, only deflecting できる while leaving 出来る alone might be a reasonable compromise, since the use of latter for denoting the potential form of する seems a lot less common.

Test cases updated to validate the changes, added in test-deinflector.js.

Done. Confirmed that the npm test passes after the change.