fasiha / kamiya-codec

Towards a Japanese verb conjugator and deconjugator based on Taeko Kamiya's *The Handbook of Japanese Verbs* and *The Handbook of Japanese Adjectives and Adverbs* opuses.
https://fasiha.github.io/kamiya-codec/
The Unlicense
14 stars 5 forks source link

Automate type I vs type II differentiation #4

Open fasiha opened 5 years ago

fasiha commented 5 years ago

If it is the combination of one kanji+ one kana, then it is likely to be a type I verb. Such 走る、歩く、切る, (exceptions are 着る、寝る),while verbs with one+ kanji && one+ kana sould be type II, such as 起きる(one kanji && two kana words)

_Originally posted by @shilik in https://github.com/_render_node/MDU6SXNzdWUzODE5MzU4Nzg=/issues/unread_timeline#issuecomment-439745249_

fasiha commented 5 years ago

@shilik I moved this to a new issue, so we can better track all the different threads.

How robust is this rule you've suggested? For a low-level library like this, I'd want an extremely robust way to determine godan vs ichidan verbs, ideally using a dictionary or morphological parser like MeCab, because libraries like this might wind up being used by other libraries and/or apps. In fact, in true JavaScript spirit, I would likely make an npm module that just does this 😆, and make it a dependency in kamiya-codec!

I don't currently need this so I'm not working on it or thinking about it, but I would definitely encourage you to see how robust you can make this rule of thumb, and can help you package it up into a library.

shilik commented 5 years ago

Actually I use Mecab as a filter to make sure what type of a verb is. The only problem is that Mecab is not portable for App development, so I have to find a reliable source (web api ) that can be used. Somehow I have managed to make it work, as long as the server is not down.

Shilik

On Mon, Nov 19, 2018 at 9:39 AM Ahmed Fasih notifications@github.com wrote:

@shilik https://github.com/shilik I moved this to a new issue, so we can better track all the different threads.

How robust is this rule you've suggested? For a low-level library like this, I'd want an extremely robust way to determine godan vs ichidan verbs, ideally using a dictionary or morphological parser like MeCab, because libraries like this might wind up being used by other libraries and/or apps. In fact, in true JavaScript spirit, I would likely make an npm module that just does this 😆, and make it a dependency in kamiya-codec!

I don't currently need this so I'm not working on it or thinking about it, but I would definitely encourage you to see how robust you can make this rule of thumb, and can help you package it up into a library.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fasiha/kamiya-codec/issues/4#issuecomment-439747078, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVvxGkKCdwwjuIHfxaBAZKHYd3XOSahks5uwgvDgaJpZM4YoQ8o .

fasiha commented 5 years ago

Oh cool! Which dictionary do you use with MeCab?

I've used Kuromoji, which is a Java implementation of the same algorithm as MeCab, with several of the same dictionaries available (I know it includes IPADIC and UniDic—I mainly use UniDic). I've thought often of spinning up a REST API on Now using this to provide a free parsing/tagging backend for people to use.

shilik commented 5 years ago

I use mecab neologd https://github.com/neologd/mecab-ipadic-neologd as the dictionary. It is frequently updated and very popular in recent years. Your idea of making up a REST WEB API is great, I am looking forward to using a service like this.

On Tue, Nov 20, 2018 at 10:39 PM Ahmed Fasih notifications@github.com wrote:

Oh cool! Which dictionary do you use with MeCab?

I've used Kuromoji https://github.com/atilika/kuromoji, which is a Java implementation of the same algorithm as MeCab, with several of the same dictionaries available (I know it includes IPADIC and UniDic—I mainly use UniDic). I've thought often of spinning up a REST API on Now https://zeit.co/now using this to provide a free parsing/tagging backend for people to use.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fasiha/kamiya-codec/issues/4#issuecomment-440295704, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVvxKA8yqvKdQqlW3FizX-2mVpF2sFcks5uxBQegaJpZM4YoQ8o .

fasiha commented 1 year ago

This is doable now via https://github.com/fasiha/godan-ichidan