atilika / kuromoji

Kuromoji is a self-contained and very easy to use Japanese morphological analyzer designed for search
Apache License 2.0
950 stars 131 forks source link

Internals documentation and academic papers? #117

Closed DarrenCook closed 7 years ago

DarrenCook commented 7 years ago

Is there any description of how kuromoji works? E.g. an overview of what each class does, how they work together. And/or academic papers on what it is doing? (E.g. Is it behaving identically to MeCab, ChaSen or Juman, and if not, what innovations is it using and why? What design trade-offs are there?)

(If neither is available, this issue is a request for that kind of documentation; if they are then it is a request for them to be linked to from the README.md file. Thanks!)

cmoen commented 7 years ago

My apologies for the late response.

Kuromoji processes Japanese in an overall similar way as MeCab, but there might be some subtle differences since Kuromoji isn't a port of the MeCab code. Look at the academic papers for MeCab if you want an overview of how things work.

If you have questions on specific pieces of code, please email me on cm@atilika.com. I can also give you a code-walkthrough if that's helpful. Thanks.