SamuraiT / mecab-python3

:snake: mecab-python. you can find original version here:http://taku910.github.io/mecab/
https://pypi.python.org/pypi/mecab-python3
Other
539 stars 51 forks source link

How to include space in the path to dictionary? #43

Closed yuyuxing-wu closed 4 years ago

yuyuxing-wu commented 4 years ago

I want to include space in path and have tried like below.

import MeCab
parser=MeCab.Tagger('-d /opt/test1 test2/dictionary')
parser=MeCab.Tagger('-d /opt/test1\ test2/dictionary')
parser=MeCab.Tagger('-d "/opt/test1 test2/dictionary"')

All three of them failed as RuntimeError.

Is any way to include space in path?

polm commented 4 years ago

This is not possible right now because the argument handling doesn't do full shell quoting; see #25 for details.

I haven't thought about this problem in a long time, but I'll take another look at whether there's a good way to handle it. It may be possible to just preparse argument with Python's shlex.

polm commented 4 years ago

So I took a look at it and this issue should be fixed in master. Unfortunately that broke Node-based parsing for some reason. Should be fixable at least, though can't say how long it'll take.

polm commented 4 years ago

OK, fixed the issues, so this should be resolved in the latest release candidate.

Could you please install the latest version like below and tell me if it fixes your issue?

pip install mecab-python3=0.996.6rc2
yuyuxing-wu commented 4 years ago

Thanks!

I have tried two patterns like below and both of them works!

import MeCab
parser=MeCab.Tagger('-d /opt/test1\ test2 -u /opt/test1\ test2/user.dic')
parser=MeCab.Tagger('-d "/opt/test1 test2" -u "/opt/test1 test2/user.dic"')
polm commented 4 years ago

OK, thanks for the confirmation. I'll release a new update with this change soon.

yuyuxing-wu commented 4 years ago

FYI: After this update, the escaped character also need to be changed as below.

Before:

'-F%M\\t%H\\n'

After:

'-F%M\t%H\n'
polm commented 4 years ago

Fix for this is included in the 1.0 release today.