buruzaemon / natto-py

natto-py combines the Python programming language with MeCab, the part-of-speech and morphological analyzer for the Japanese language.
BSD 2-Clause "Simplified" License
92 stars 13 forks source link

"natto.api.MeCabError: MECAB_NBEST request type is not set" error under some circumstances #95

Closed kohchuanhock closed 7 years ago

kohchuanhock commented 7 years ago

The following code will cause the "natto.api.MeCabError: MECAB_NBEST request type is not set" error

# -*- coding: utf-8 -*-
from natto import MeCab

nm = MeCab('-F%m,%f[0],%f[1],%f[8]')
for n in nm.parse('私はアシャです', as_nodes=True):
    print(n.feature)
buruzaemon commented 7 years ago

Thanks for reporting this, @kohchuanhock. @himkt gave me a clue on reproducing this, so I will have a closer look into this error.

buruzaemon commented 7 years ago

@kohchuanhock, could you please tell me the following:

  1. OS platform
  2. Python version and bit-ness
  3. output of mecab -P
kohchuanhock commented 7 years ago

@buruzaemon, thank you for your time!

OS platform: MacOS Sierra 10.12.6 Python version and bit-ness: Python 2.7.10 (or 3.6.1 both gives error) and 64-bit output of mecab -P bos-feature: BOS/EOS,,,,,,,, bos-format: config-charset: EUC-JP cost-factor: 700 dicdir: /usr/local/lib/mecab/dic/ipadic dump-config: 1 eon-format: eos-format: EOS\n eos-format-chasen: EOS\n eos-format-chasen2: EOS\n eos-format-simple: EOS\n eos-format-yomi: \n eval-size: 8 lattice-level: 0 max-grouping-size: 24 nbest: 1 node-format: %m\t%H\n node-format-chasen: %m\t%f[7]\t%f[6]\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n node-format-chasen2: %M\t%f[7]\t%f[6]\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n node-format-simple: %m\t%F-[0,1,2,3]\n node-format-yomi: %pS%f[7] theta: 0.75 unk-eval-size: 4 unk-format: %m\t%H\n unk-format-chasen: %m\t%m\t%m\t%F-[0,1,2,3]\t\t\n unk-format-chasen2: %M\t%m\t%m\t%F-[0,1,2,3]\t\t\n unk-format-yomi: %M

buruzaemon commented 7 years ago

@kohchuanhock, I tried your input using mecab directly and without any output formatting. This is what it returns:

$ mecab
私はアシャです
私   名詞,代名詞,一般,*,*,*,私,ワタシ,ワタシ
は   助詞,係助詞,*,*,*,*,は,ハ,ワ
アシャ 名詞,一般,*,*,*,*,*
です  助動詞,*,*,*,特殊・デス,基本形,です,デス,デス
EOS

Notice how there are only 7 tokens for the noun アシャ. This is naturally so because アシャ is not listed in your ipadic dictionary, and so MeCab treats it as an unknown word (stat of 1). But since you are attempting to access the 8th item in a list that only has 7 tokens, an error occurs.

You can see this if you do this directly with mecab using the output formatting you listed:

$ mecab -F'%m,%f[0],%f[1],%f[8]\n'
私はアシャです
given index is out of range

You will not see this index out of range error if you use words that are listed in ipadic. For example:

私はカオスです

私,名詞,代名詞,ワタシ
は,助詞,係助詞,ワ
カオス,名詞,一般,カオス
です,助動詞,,デス
EOS

Therefore, since this is neither a bug in mecab or an issue directly stemming from natto-py itself, I am going to close this issue.

However, you have uncovered a separate issue which is a bug, namely, the error message from mecab in this case was not captured correctly by natto-py. I will be opening up a separate issue for that.

Please let me know if you've any additional questions. Thank you for using natto-py!

kohchuanhock commented 7 years ago

@buruzaemon I see. Thanks!