Closed kohchuanhock closed 7 years ago
Thanks for reporting this, @kohchuanhock. @himkt gave me a clue on reproducing this, so I will have a closer look into this error.
@kohchuanhock, could you please tell me the following:
mecab -P
@buruzaemon, thank you for your time!
OS platform: MacOS Sierra 10.12.6 Python version and bit-ness: Python 2.7.10 (or 3.6.1 both gives error) and 64-bit output of mecab -P bos-feature: BOS/EOS,,,,,,,, bos-format: config-charset: EUC-JP cost-factor: 700 dicdir: /usr/local/lib/mecab/dic/ipadic dump-config: 1 eon-format: eos-format: EOS\n eos-format-chasen: EOS\n eos-format-chasen2: EOS\n eos-format-simple: EOS\n eos-format-yomi: \n eval-size: 8 lattice-level: 0 max-grouping-size: 24 nbest: 1 node-format: %m\t%H\n node-format-chasen: %m\t%f[7]\t%f[6]\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n node-format-chasen2: %M\t%f[7]\t%f[6]\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n node-format-simple: %m\t%F-[0,1,2,3]\n node-format-yomi: %pS%f[7] theta: 0.75 unk-eval-size: 4 unk-format: %m\t%H\n unk-format-chasen: %m\t%m\t%m\t%F-[0,1,2,3]\t\t\n unk-format-chasen2: %M\t%m\t%m\t%F-[0,1,2,3]\t\t\n unk-format-yomi: %M
@kohchuanhock, I tried your input using mecab
directly and without any output formatting. This is what it returns:
$ mecab
私はアシャです
私 名詞,代名詞,一般,*,*,*,私,ワタシ,ワタシ
は 助詞,係助詞,*,*,*,*,は,ハ,ワ
アシャ 名詞,一般,*,*,*,*,*
です 助動詞,*,*,*,特殊・デス,基本形,です,デス,デス
EOS
Notice how there are only 7 tokens for the noun アシャ. This is naturally so because アシャ is not listed in your ipadic dictionary, and so MeCab treats it as an unknown word (stat
of 1). But since you are attempting to access the 8th item in a list that only has 7 tokens, an error occurs.
You can see this if you do this directly with mecab
using the output formatting you listed:
$ mecab -F'%m,%f[0],%f[1],%f[8]\n'
私はアシャです
given index is out of range
You will not see this index out of range
error if you use words that are listed in ipadic. For example:
私はカオスです
私,名詞,代名詞,ワタシ
は,助詞,係助詞,ワ
カオス,名詞,一般,カオス
です,助動詞,,デス
EOS
Therefore, since this is neither a bug in mecab
or an issue directly stemming from natto-py itself, I am going to close this issue.
However, you have uncovered a separate issue which is a bug, namely, the error message from mecab
in this case was not captured correctly by natto-py. I will be opening up a separate issue for that.
Please let me know if you've any additional questions. Thank you for using natto-py!
@buruzaemon I see. Thanks!
The following code will cause the "natto.api.MeCabError: MECAB_NBEST request type is not set" error