Closed tomohideshibata closed 3 years ago
You're completely right, thanks for pointing this out!
Not using --with-charset=utf8
in the setup will cause errors later on during tokenization. I'll change this in the README.md
.
An alternative would be to not specify the mecab path and dictionary paths (in our scripts, using the --is_japanese
,--mecab_dir
, and mecab_dic_dir
flags) at all. Mecab-python3 by default uses the mecab-ipadic-20070801
dictionary with utf-8 charset, which I didn't know about when writing the code originally.
Thanks for your response!
Thanks for releasing the codes. I am testing your codes for Japanese.
In the "Language-specific prerequisites" section of "Setup" in README.md, I think
--with-charset=utf8
option is needed for./configure
in "install MeCab" and "install the mecab-ipadic-20070801 dictionary" because the default encoding is euc-jp.Thanks in advance.