Closed buruzaemon closed 6 years ago
The output-format-type
option is used in a dictionary's dicrc
to specify a default output format type for node-formatting. For example consider the following sample dicrc
for Unidic:
output-format-type = unidic2
node-format-unidic = %m\t%f[9]\t%f[6]\t%f[7]\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n
unk-format-unidic = %m\t%m\t%m\t%m\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n
bos-format-unidic =
eos-format-unidic = EOS\n
node-format-chamame = \t%m\t%f[9]\t%f[6]\t%f[7]\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n
;unk-format-chamame = \t%m\t\t\t%m\tUNK\t\t\n
unk-format-chamame = \t%m\t\t\t%m\t%F-[0,1,2,3]\t\t\n
bos-format-chamame = B
eos-format-chamame =
node-format-unidic2 = %m\t%f[9]\t%f[6]\t%f[7]\t%F-[0,1,2,3]\t%f[4]\t%f[5]\t%f[12]\n
unk-format-unidic2 = %m\t%m\t%m\t%m\t%F-[0,1,2,3]\t%f[4]\t%f[5]\n
bos-format-unidic2 =
eos-format-unidic2 = EOS\n
Here, the default formatting when no other is specified is then *-format-unidic2
MeCab gives preference to output-format-type
over node-format
, etc., unless output-format-type
is explicitly set to be empty. This behavior is consistent across ipadic, jumandic and unidic dictionaries.
MeCab's PR (https://github.com/taku910/mecab/pull/38) maybe solve this problem.
I will close this issue. However, I have updated the output-format-type
MeCab option description in the project wiki to describe how to override an existing, default output format by specifying an empty string.
As reported by @massongit in pull request #98 , node-formatting seems to be ignored by mecab when using Unidic. Please refer to taku910/mecab#41.
A workaround is to force natto-py to accept an empty string value for output
-O
.Steps to reproduce:
node-format
for Unidicnatto-py
)with B and C (usingmecab
from command-line)