Open GoogleCodeExporter opened 8 years ago
Original comment by johtani
on 11 Jun 2012 at 8:30
Create first patch.
1. Run following command to apply patch
cd lucene-gosen-trunk-readonly
patch -p0 < lucene-gosen-unidic.patch
2. create dictionary directory
mkdir dictionary/unidic
3. Download Unidic src tar.gz file, and copy to "dictionary/unidic/"
4. Build lucene-gosen
ant -Ddictype=unidic
Original comment by johtani
on 18 Jun 2012 at 9:14
Attachments:
Original comment by johtani
on 18 Jun 2012 at 9:14
This patch have two limitations.
1. The COMPOUND entry of dictionary is not support.
COMPOUND entry include only in Noun.common.dic file, and only one entry.
lucene-gosen dictionary.csv is not support Compound token.
This patch is skiped this entry.
2. Empty pronunciation for readings in 5 entry.
ex. "ミンサー" has two variation of readings. But pronunciation is one.
Currently, lucene-gosen is not support readings.size() > pronunciation.size().
This patch return following lists.
readigns[ミンサー, ミンサア]
pronunciations[ミンサー, ]
Original comment by johtani
on 18 Jun 2012 at 9:34
Thank you Ohtani-san,
I succeeded to apply patch. I can start evaluation of UniDic and comparison
with IPA Dictionary.
Original comment by kazuaki....@gmail.com
on 19 Jun 2012 at 1:38
Original issue reported on code.google.com by
kazuaki....@gmail.com
on 8 Jun 2012 at 3:12