-
With the recent accidental regression of Japanese (Kuromoji) tokenization throughput due to exciting FST optimizations, we [added new nightly Lucene benchmarks](https://github.com/mikemccand/luceneuti…
-
We should look into adding NAIST-jdic support to Kuromoji as this dictionary is better than the current IPADIC. The NAIST-jdic license seems fine, but needs a formal check-off before any inclusion in…
-
- MeCab
- KyTea
- Kuromoji
- Sudachi
- Juman++
-
### Summary
`emojilist.json` や `unicode-emoji-indexes` 下の JSON データは Unicode の提供するデータ(CLDR など)を使って自動生成したいです。
### Purpose
#### Pros
- Unicode 更新の度に手で追加しなくて済む
- 手作業によるミス(#13805)をなくせる
- 多言語対応が…
-
現状のなでしこは、リソースの限られた環境では、助詞を利用した強制単語分割ルールは、うまく動いています。
しかし、昨今、形態素解析の負荷もそれほど高くないので、形態素解析ルールを導入した、プログラミング言語も一考の価値ありです。
ただし、既存の助詞区切りルールと併用する方法もあり。
-
I've inspired by this mail-list thread.
As many Japanese already know, default built-in dictionary bundled with Kuromoji (MeCab IPADIC) is a bit old and no longer maintained for many years. While i…
-
### Problem
Bower have no own repository, so we must put compiled builds in this git repo.
Cons:
- Pull request will be often a large number of diffs
- Git repository is bigger because of comp…
-
I tried to build a UniDic dictionary for using it along with Kuromoji on Solr 3.6. I think UniDic is a good dictionary than IPA dictionary, so Kuromoji for Lucene/Solr should support UniDic dictionary…
-
The normalizeEntry option is missing from the Javadoc of Kuromoji DictionaryBuilder.
Without this explanation, users don't know what it means until they see the code.
Also, if user follows the usage …
-
Ivy.xml contains dictionary URL both of IPADIC and NAIST-JDIC.
But there’re already gone. No existing. So it causes build break at download-dict task.
Google Code will be closed soon later. And Souce…