WorksApplications / SudachiTra

Japanese tokenizer for Transformers
Apache License 2.0
77 stars 10 forks source link

add normalizer that leaved conjugation #31

Closed katsutan closed 2 years ago

eiennohito commented 2 years ago

Also, typo: inflectoin_table.json -> inflection_table.json

t-yamamura commented 2 years ago

It would be helpful to have some more comments and documents on NormalizerLeavedConjugation. In particular, since NormalizerLeavedConjugation is a WORD_FORM_TYPE that does not exist in SudachiPy, so it helps understanding to explain how it differs from the usual normalization process.