SamuraiT / mecab-python3

:snake: mecab-python. you can find original version here:http://taku910.github.io/mecab/
https://pypi.python.org/pypi/mecab-python3
Other
539 stars 51 forks source link

looks for configuration in the wrong place on Ubuntu 20.04 #53

Closed rspeer closed 4 years ago

rspeer commented 4 years ago

I'm trying to automate an installation on Ubuntu 20.04 that depends on this Python package. The automated installation starts from a fresh Ubuntu 20.04 image, then uses Puppet to install apt packages such as mecab, libmecab2, libmecab-dev, and mecab-ipadic-utf8, then sets up a Python 3.8 virtual environment and installs a package into it that depends on MeCab.

When I try to instantiate a MeCab.Tagger() on that system, it fails with a RuntimeError and this message under "ERROR DETAILS":

arguments:
error message: [ifs] no such file or directory: /usr/local/etc/mecabrc

The configuration file installed by apt is /etc/mecabrc, just like it is on 18.04, not /usr/local/etc/mecabrc.

This configuration problem seems specific to the Python module. Running mecab-config --sysconfdir shows /etc. I can run mecab from the command line and type some text and it will analyze it.

I can add a symlink and make the module work, but why is the Python code looking in /usr/local/etc?

polm commented 4 years ago

Sorry for the trouble, thanks for opening an issue.

What's happening here is that if you download the MeCab source and build it, it looks in /usr/local/etc/. Debian, and therefore Ubuntu (and apparently CentOS), have patched the source or changed the configure flags to use /etc instead. I am not entirely sure why they did this, I think they have a policy about standardized paths. mecab-config in any given install has the paths set by configure baked into it so it's just showing that.

The binary copy of MeCab this library includes uses the unmodified upstream source, so it looks in the default path.

Note that since version 0.996.1 of this library was released in late 2018, it has had wheels for Linux that include binary copies of MeCab. Until 1.0 all wheels also included a bundled copy of ipadic. So for all versions starting with 0.996 your apt packages were not used at all.

With 1.0 the dictionary was unbundled. You can install unidic or unidic-lite with pip and it'll be configured automatically, or you can install ipadic with pip and pass flags to the Tagger if you want to keep using the same dictionary, though I would recommend switching to unidic if possible, since IPAdic has been abandoned since 2007.

In your case I would recommend not installing any of the apt packages and using a pip-installable dictionary instead.

Again, sorry for the trouble, let me know if that clears things up.

polm commented 4 years ago

Closing this for lack of activity but let me know if you have more questions.