WorksApplications / SudachiDict

A lexicon for Sudachi
233 stars 19 forks source link

Normalization of すみません and すいません differs #22

Closed rsimmons closed 3 years ago

rsimmons commented 4 years ago

Is this the right place to report linguistic issues with the dictionaries? Apologies if not.

Using Sudachi 0.4.3, the core dictionary version 20200722, and mode C, I noticed that すみません and すいません do not normalize to the same verb, and it seems like they should.

For すいません, the normalized verb is 済む, which seems correct:

すい  動詞,一般,*,*,五段-マ行,連用形-イ音便 済む
ませ  助動詞,*,*,*,助動詞-マス,未然形-一般 ます
ん   助動詞,*,*,*,助動詞-ヌ,終止形-撥音便 ず

For すみません, the normalized verb is すむ. It seems like it should be 済む also?

すみ  動詞,一般,*,*,五段-マ行,連用形-一般  すむ
ませ  助動詞,*,*,*,助動詞-マス,未然形-一般 ます
ん   助動詞,*,*,*,助動詞-ヌ,終止形-撥音便 ず
sakamoto-mi commented 4 years ago

Thank you for asking about Sudachi Normalization.

すみません could be 済みません , 住みません or 澄みません. Sudachi does not normalize a word to any particular one if there is a possibility of other words.

すみ  動詞,一般,*,*,五段-マ行,連用形-一般  すむ
ませ  助動詞,*,*,*,助動詞-マス,未然形-一般 ます
ん   助動詞,*,*,*,助動詞-ヌ,終止形-撥音便 ず

Therefore, すみ(動詞,一般,,,五段-マ行,連用形-一般)is normalized to すむ is the correct behavior.

In the same way, すい(動詞,一般,,,五段-マ行,連用形-イ音便) should be normalized to すむ.

すい  動詞,一般,*,*,五段-マ行,連用形-イ音便 済む
ませ  助動詞,*,*,*,助動詞-マス,未然形-一般 ます
ん   助動詞,*,*,*,助動詞-ヌ,終止形-撥音便 ず

We will fix it in the next update.