buruzaemon / natto-py

natto-py combines the Python programming language with MeCab, the part-of-speech and morphological analyzer for the Japanese language.
BSD 2-Clause "Simplified" License
92 stars 13 forks source link

Index out of range error message not being captured by natto-py #96

Closed buruzaemon closed 6 years ago

buruzaemon commented 7 years ago

When using output formatting in node parsing to capture the 8th token (pronunciation) for ipadic, the index out of range error message that is visible in mecab is not captured by natto-py.

To reproduce:

from natto import MeCab

nm = MeCab('-F%m,%f[0],%f[1],%f[8]')
for n in nm.parse('私はアシャです', as_nodes=True):
    print(n.feature)
... 
私,名詞,代名詞,ワタシ
は,助詞,係助詞,ワ
MECAB_NBEST request type is not set
Traceback (most recent call last):
    File "/usr/home/buruzaemon/dev/github/natto-py/natto/mecab.py", line 400, in __parse_tonodes
    rawf = self.__ffi.string(sp)
    File "/usr/home/buruzaemon/dev/github/natto-py/.py35env/lib/python3.5/site-packages/cffi/api.py", line 288, in string
    return self._backend.string(cdata, maxlen)
RuntimeError: cannot use string() on <cdata 'char *' NULL>

Compare with similar logic using mecab:

$ mecab -F'%m,%f[0],%f[1],%f[8]\n'
私はアシャです
given index is out of range

Discovered indirectly in issue "natto.api.MeCabError: MECAB_NBEST request type is not set" error under some circumstances.

buruzaemon commented 6 years ago

Errors in node parsing related to output formatting now correctly capture the error message from mecab.