Open markdevel opened 6 years ago
Thank you for raising this issue, @markdevel. I will have a closer look at this.
MeCab's expected behavior for the usage pattern described above has been confirmed per below.
Case 1: ASCII whitespace between 2 chars specified as boundary constraint: (natto-py-36) F:\Area52\home\buruzaemon\dev\github\natto-py>echo a aあ | mecab a 感動詞,,,,,, a 感動詞,,,,,, あ フィラー,,,,,*,あ,ア,ア EOS
Case 2: Full-width 空白 char between 2 chars specified as boundary constraint: (natto-py-36) F:\Area52\home\buruzaemon\dev\github\natto-py>echo a aあ | mecab a 名詞,固有名詞,組織,,,, 記号,空白,,,,, , , a 感動詞,,,,,, あ フィラー,,,,,*,あ,ア,ア EOS
If natto.py
is to conform to the prescribed behavior, then some changes need to be made to natto/mecab.py
and natto/support.py
with respect to whitespace handling in the yield of tokens, etc.
I encountered an error when running the following code. I think that it happens when two or more keywords contained in a parsing text and they are adjacent to each other with the delimiter between them.
output
environment