birchill / 10ten-ja-reader

A browser extension to translate Japanese by hovering over words.
https://addons.mozilla.org/firefox/addon/10ten-ja-reader/
GNU General Public License v3.0
604 stars 45 forks source link

Drop 五段化 potential rule #2047

Closed enellis closed 1 month ago

enellis commented 1 month ago

Follow-up to #2038.

I encountered duplicate results for inputs like 宣せぬ or 宣せず, which stemmed from the following rule:

['せる', 'する', Type.IchidanVerb, Type.SpecialSuruVerb, [Reason.Irregular, Reason.Potential]],

After reading again this resource from JMdict, I realized that this rule is incorrect. Verbs undergoing 五段化 have either し得る or できる as their potential forms, not the す-Godan potential form (せる).

By dropping this incorrect rule, the issue is resolved, and we can also revert the previous commit that was intended to prevent invalid sequencing of potential forms.

enellis commented 1 month ago

Sorry for catching this a little bit too late!

I plan to add a rule for masu-stem + 得る/える/うる as a potential form. Or do you think it would be better to have a separate reason like -eru?

birtles commented 1 month ago

Sorry for catching this a little bit too late!

Not at all. Thank you for catching this!

I plan to add a rule for masu-stem + 得る/える/うる as a potential form. Or do you think it would be better to have a separate reason like -eru?

I think I lean towards more explicit rules letting potential mean the potential form students learn about in classrooms/textbooks and having a separate annotation for -eru/-uru even if the conjugation text is mostly the same.

(Also, since JMdict already has two entries for あり得る, I guess we'll end up with triplicate results when looking up あり得る after adding this new rule? Maybe that's unavoidable?)

enellis commented 1 month ago

(Also, since JMdict already has two entries for あり得る, I guess we'll end up with triplicate results when looking up あり得る after adding this new rule? Maybe that's unavoidable?)

Yeah, I'm a bit concerned about 見える as well, but I think as long as it's sorted correctly, it should be fine and not too confusing. What do you think?

birtles commented 1 month ago

Yeah, I'm a bit concerned about 見える as well, but I think as long as it's sorted correctly, it should be fine and not too confusing. What do you think?

I don't suppose there's any way to explicitly detect and filter out those cases? Alternatively we could just add the +得る rule and not add the +える・+うる rules for now?

enellis commented 3 weeks ago

I don't suppose there's any way to explicitly detect and filter out those cases? Alternatively we could just add the +得る rule and not add the +える・+うる rules for now?

While I believe it would be fairly simple to filter those out, I’m hesitant to do so because it feels somewhat arbitrary to me. I actually quite like it when entries like あり得る are "explained" through deinflection, as long as the "explanatory" entry is placed afterward. I can see directly that あり得る is a form of ある. However, in cases like 見える, it could be misleading, since the える in 見える isn’t related to 得る.

Just adding 得る and not える and うる would be the way to go then, I think.

birtles commented 3 weeks ago

While I believe it would be fairly simple to filter those out, I’m hesitant to do so because it feels somewhat arbitrary to me. I actually quite like it when entries like あり得る are "explained" through deinflection, as long as the "explanatory" entry is placed afterward. I can see directly that あり得る is a form of ある. However, in cases like 見える, it could be misleading, since the える in 見える isn’t related to 得る.

Just adding 得る and not える and うる would be the way to go then, I think.

Sounds good. We can reinvestigate enabling the える・うる patterns later if it proves useful.