Closed chezou closed 6 years ago
this is actually expected! this is how the mecab itself works! just check the function of mecab in your terminal,
echo "こんにちは、Python" | mecab -O wakati > file
the result should be
こんにちは 、 Python \n
No, it's not authors intension to concatenate line break to every end of line .
echo "こんにちは、Python\n今日の天気は晴れです" | mecab -O wakati
こんにちは 、 Python
今日 の 天気 は 晴れ です
It seems to appear only last line to display and I think it should be stripped for parsed result.
This behavior comes from the core MeCab C library. I converted your test program to the C equivalent
#include <mecab.h>
#include <stdio.h>
int main(void)
{
mecab_t *tagger = mecab_new2("-Owakati");
if (!tagger) return 1;
const char *out = mecab_sparse_tostr(tagger, "こんにちは、Python");
fputs(out, stdout);
return 0;
}
and I see a newline at the end of the string it prints:
$ gcc -O2 -Wall -g test.c -lmecab
$ ./a.out | hd
00000000 e3 81 93 e3 82 93 e3 81 ab e3 81 a1 e3 81 af 20 |............... |
00000010 e3 80 81 20 50 79 74 68 6f 6e 20 0a |... Python .|
0000001c
I do not think we should make the Python module behave differently from the C interface it wraps. Please take this up with the developers of MeCab itself.
Using mecab-python3 with
-O wakati
option, there is an extra end of line.actual:
expected: