Closed massongit closed 6 years ago
I will write a test for this implementation, but I don't know where to write it in tests/test_option_parse.py
.
Please tell me.
Thank you @massongit for bringing this issue to my attention. I will first confirm this and then open up an issue ticket. Please give me some time to look into this.
OK, this was easy enough to confirm.
I have opened up issue #99 to track this. I will start by coming up with appropriate tests, hopefully for both Windows and UNIX-type platforms. I don't have any tests for dictionaries besides ipadic, so I will need some time to come up with something that can cover Unidic, and perhaps Jumandic as well.
@massongit, thank you for your patience. Here is what I have found:
output-format-type
over node-format
, etc.output-format-type
(specifying an empty string), node-format
will then be used.This behavior of MeCab is consistent across ipadic, jumandic and unidic, and is not a function of the dictionary used.
I expect that your Unidic dicrc
has the following lines:
output-format-type = unidic
node-format-unidic = %m\t%f[9]\t%f[6]\t%f[7]\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n
That means that unless you explicitly unset output-format-type
by passing MeCab an empty string/name with -O ""
, the node format will default to node-format-unidic
even if you also used -F
. If you comment out output-format-type = unidic
in your dicrc
, then you will see that you don't need -O ""
.
You are correct that natto-py
must likewise be able to accept -O ""
in order to mirror this behavior.
Hence, I will be accepting your pull request. Thank you very much! I will come up with some unit tests to cover this new behavior.
@buruzaemon Thank you for confirm and merging!
(Related to https://github.com/taku910/mecab/issues/41) I enabled to specify an empty string option to enable to specify
node-format
option when using UniDic.