Closed technolingo closed 6 years ago
Hello, @evilplanet, and thanks for using natto-py...
If you mean to use 2 dictionary files separately, then you could create 2 separate instances of MeCab
.
You can specify a system dictionary with the mecab option --dicdir
or a custom user dictionary with --userdic
. Pass in these options when you instantiate MeCab
.
@buruzaemon Thank you for your prompt reply. Sorry, I'm a bit new to this. Below is my code, let's say I want to use the dictionary installed at /usr/local/lib/mecab/dic/mecab-ko-dic/sys.dic
here,
(My default Japanese dict is /usr/local/lib/mecab/dic/ipadic/sys.dic
)
how should I do it?
from natto import MeCab
with MeCab('-Owakati') as nm:
segmented_text = nm.parse(text)
Thank you very much!!
Below is my /usr/local/etc/mecabrc
file:
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
dicdir = /usr/local/lib/mecab/dic/ipadic
; userdic = /usr/local/lib/mecab/dic/mecab-ko-dic
; output-format-type = wakati
; input-buffer-size = 8192
; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n
When I tried to uncomment userdic, I can no longer instantiate MeCab.
natto-py
honors nearly the exact same options as when you use mecab
from the command-line.
So the following two approaches are equivalent:
# mecab from command-line
mecab --dicdir=/usr/local/lib/mecab/dic/mecab-ko-dic
# with natto-py
nm = MeCab('--dicdir=/usr/local/lib/mecab/dic/mecab-ko-dic')
It is a good idea to try out your choice of options first at the mecab
command-line before using them in instantiating MeCab()
with natto-py
.
Hope that helps!
It certainly helps! I wasn't sure how to set dict path and the other option together. Then I tried to chain them in a single string like this MeCab('--dicdir=/usr/local/lib/mecab/dic/mecab-ko-dic -Owakati')
And it seems to be working. Thank you a lot!
@evilplanet, I am glad to hear that you were able to correctly use both -O
and --dicdir
together.
natto-py
is meant to use the options in the same manner as they are passed to the mecab
command-line in order to be as familiar as possible, and so that is why you have to specify all of the options at instantiation time.
Beside using a single, long options string, you can also use key-value pairs in dict
.
Alternately, if you find that you have a lot of options which you want to manage in a custom configuration file like your mecabrc
file, you could use --rcfile
.
For example:
# a single, long string
MeCab('--dicdir=/usr/local/lib/mecab/dic/mecab-ko-dic -Owakati')
# a dict with key-values
MeCab({ 'dicdir': '/usr/local/lib/mecab/dic/mecab-ko-dic',
'output_format_type': 'wakati' })
# or if you want to put all of your options in an rcfile
MeCab('--rcfile=/path/to/custom/rcfile/')
You can review the various MeCab
options in the project Wiki's Appendix B: Supported MeCab Options.
@buruzaemon Great!! Thanks a lot. That's very informative!
I'm using Mecab to process both Japanese and Korean texts, I have two dictionary files. How do I specify a particular dictionary when instantiating Mecab in a function? Thank you!