giggls / osml10n

Localization functions for Openstreetmap
https://tile.openstreetmap.de
GNU General Public License v3.0
14 stars 7 forks source link

Server cannot utilize tltk library #19

Closed itscz-org closed 1 year ago

itscz-org commented 1 year ago

Just installed on Debian Bullseye 11.5.

$ /usr/bin/geo-transcript-srv.py -v -g /usr/share/osml10n/boundaries
Loading osml10n transcription server:

ERROR: unable to load required python modules, please install them as follows:
pip install pykakasi
pip install tltk
pip install pinyin_jyutping_sentence

I figured out by comments in the script this is because of the tltk lib, but it is installed:

$ pip install tltk
Requirement already satisfied: tltk in ./python3.9/dist-packages (1.6)
Requirement already satisfied: sklearn in ./python3.9/dist-packages (from tltk) (0.0.post1)
Requirement already satisfied: gensim in ./python3.9/dist-packages (from tltk) (4.2.0)
Requirement already satisfied: nltk in ./python3.9/dist-packages (from tltk) (3.7)
Requirement already satisfied: sklearn-crfsuite in ./python3.9/dist-packages (from tltk) (0.3.6)
Requirement already satisfied: scipy>=0.18.1 in ./python3.9/dist-packages (from gensim->tltk) (1.9.3)
Requirement already satisfied: numpy>=1.17.0 in /usr/lib/python3/dist-packages (from gensim->tltk) (1.19.5)
Requirement already satisfied: smart-open>=1.8.1 in ./python3.9/dist-packages (from gensim->tltk) (6.2.0)
Requirement already satisfied: click in ./python3.9/dist-packages (from nltk->tltk) (8.1.3)
Requirement already satisfied: tqdm in ./python3.9/dist-packages (from nltk->tltk) (4.64.1)
Requirement already satisfied: regex>=2021.8.3 in ./python3.9/dist-packages (from nltk->tltk) (2022.10.31)
Requirement already satisfied: joblib in ./python3.9/dist-packages (from nltk->tltk) (1.2.0)
Requirement already satisfied: tabulate in ./python3.9/dist-packages (from sklearn-crfsuite->tltk) (0.9.0)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from sklearn-crfsuite->tltk) (1.16.0)
Requirement already satisfied: python-crfsuite>=0.8.3 in ./python3.9/dist-packages (from sklearn-crfsuite->tltk) (0.9.8)

Any suggestions?

giggls commented 1 year ago

The sole purpose of this error message is to make it clear that you need these three dependencies and not just tltk. No digging into the sourcecode should be required just reading the (hopefully) fine manual.

Did you install them as explained in INSTALL.md?

itscz-org commented 1 year ago

Sure i did.

pip install pykakasi
Requirement already satisfied: pykakasi in /usr/local/lib/python3.9/dist-packages (2.2.1)
Requirement already satisfied: deprecated in /usr/local/lib/python3.9/dist-packages (from pykakasi) (1.2.13)
Requirement already satisfied: jaconv in /usr/local/lib/python3.9/dist-packages (from pykakasi) (0.3)
Requirement already satisfied: wrapt<2,>=1.10 in /usr/local/lib/python3.9/dist-packages (from deprecated->pykakasi) (1.14.1)
pip install pinyin_jyutping_sentence
Requirement already satisfied: pinyin_jyutping_sentence in /usr/local/lib/python3.9/dist-packages (1.3)
Requirement already satisfied: jieba in /usr/local/lib/python3.9/dist-packages (from pinyin_jyutping_sentence) (0.42.1)

I digged the code to find out its the tltk case that triggers the exception.

giggls commented 1 year ago

OK, than neither tltk nor the other two are likely an issue.

Why are you trying to run geo-transcript-srv.py manually anyway? It should get started automatically after installing the debian package.

See systemctl status osml10n

The above command does work fine here after running systemctl stop osml10n:

/usr/bin/geo-transcript-srv.py -s -g /usr/share/osml10n/boundaries
Loading osml10n transcription server: ready.
itscz-org commented 1 year ago

I started investigation because it failed to (auto) start:

● osml10n.service - OSM l10n transcription server
     Loaded: loaded (/lib/systemd/system/osml10n.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Tue 2022-11-15 13:52:16 CET; 51min ago
    Process: 228367 ExecStart=/usr/bin/geo-transcript-srv.py -s -g /usr/share/osml10n/boundaries (code=exited, status=1/FAILURE)
   Main PID: 228367 (code=exited, status=1/FAILURE)
        CPU: 1.126s

Nov 15 13:52:16 xxx systemd[1]: osml10n.service: Scheduled restart job, restart counter is at 5.
Nov 15 13:52:16 xxx systemd[1]: Stopped OSM l10n transcription server.
Nov 15 13:52:16 xxx systemd[1]: osml10n.service: Consumed 1.126s CPU time.
Nov 15 13:52:16 xxx systemd[1]: osml10n.service: Start request repeated too quickly.
Nov 15 13:52:16 xxx systemd[1]: osml10n.service: Failed with result 'exit-code'.
Nov 15 13:52:16 xxx systemd[1]: Failed to start OSM l10n transcription server.
giggls commented 1 year ago

OK so your python modules seem to differ :(

Does this look different for you?


 osml10n/ (master) > python
Python 3.9.2 (default, Feb 28 2021, 17:03:44) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tltk
>>> import pykakasi
>>> import pinyin_jyutping_sentence
Building prefix dict from /usr/local/lib/python3.9/dist-packages/pinyin_jyutping_sentence/dict.txt.big ...
Loading model from cache /tmp/jieba.udae52a0cdc3624438ee23d21e0736dec.cache
Dumping model to file cache /tmp/jieba.udae52a0cdc3624438ee23d21e0736dec.cache
Dump cache file failed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/jieba/__init__.py", line 154, in initialize
    _replace_file(fpath, cache_file)
PermissionError: [Errno 1] Operation not permitted: '/tmp/tmpbm3_5frg' -> '/tmp/jieba.udae52a0cdc3624438ee23d21e0736dec.cache'
Loading model cost 2.333 seconds.
Prefix dict has been built successfully.
>>> 
itscz-org commented 1 year ago
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tltk
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/dist-packages/tltk/__init__.py", line 1, in <module>
    from tltk import nlp
  File "/usr/local/lib/python3.9/dist-packages/tltk/nlp.py", line 31, in <module>
    from sklearn.ensemble import RandomForestClassifier
ModuleNotFoundError: No module named 'sklearn'
>>> import pykakasi
>>> import pinyin_jyutping_sentence
giggls commented 1 year ago

Hm, this is what it looks like here:

~/ # pip install tltk
Requirement already satisfied: tltk in /usr/local/lib/python3.9/dist-packages (1.6)
Requirement already satisfied: sklearn-crfsuite in /usr/local/lib/python3.9/dist-packages (from tltk) (0.3.6)
Requirement already satisfied: sklearn in /usr/local/lib/python3.9/dist-packages (from tltk) (0.0.post1)
Requirement already satisfied: nltk in /usr/local/lib/python3.9/dist-packages (from tltk) (3.7)
Requirement already satisfied: gensim in /usr/local/lib/python3.9/dist-packages (from tltk) (4.1.2)
Requirement already satisfied: scipy>=0.18.1 in /usr/lib/python3/dist-packages (from gensim->tltk) (1.6.0)
Requirement already satisfied: smart-open>=1.8.1 in /usr/local/lib/python3.9/dist-packages (from gensim->tltk) (5.2.1)
Requirement already satisfied: numpy>=1.17.0 in /usr/lib/python3/dist-packages (from gensim->tltk) (1.19.5)
Requirement already satisfied: tqdm in /usr/local/lib/python3.9/dist-packages (from nltk->tltk) (4.62.3)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.9/dist-packages (from nltk->tltk) (2022.1.18)
Requirement already satisfied: click in /usr/lib/python3/dist-packages (from nltk->tltk) (7.1.2)
Requirement already satisfied: joblib in /usr/local/lib/python3.9/dist-packages (from nltk->tltk) (1.1.0)
Requirement already satisfied: tabulate in /usr/local/lib/python3.9/dist-packages (from sklearn-crfsuite->tltk) (0.8.9)
Requirement already satisfied: python-crfsuite>=0.8.3 in /usr/local/lib/python3.9/dist-packages (from sklearn-crfsuite->tltk) (0.9.7)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from sklearn-crfsuite->tltk) (1.16.0)
itscz-org commented 1 year ago

This one fixed it:

python3 -m pip install scikit-learn

Server is now running. Thanks anyway.

giggls commented 1 year ago

Hm looks like a broken dependency in tltk then. Maybe we should report it.

MalteHillmann commented 1 year ago

Additional Info for this: In Ubuntu 22.04.1 LTS you will get the same error:

Loading osml10n transcription server:
ERROR: unable to load required python modules, please install them as follows:
pip install pykakasi
pip install tltk
pip install pinyin_jyutping_sentence

Trying to import tltk in python3 you will get:

Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tltk
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/tltk/__init__.py", line 2, in <module>
    from tltk import corpus
  File "/usr/local/lib/python3.10/dist-packages/tltk/corpus.py", line 23, in <module>
    import gensim
  File "/usr/local/lib/python3.10/dist-packages/gensim/__init__.py", line 11, in <module>
    from gensim import parsing, corpora, matutils, interfaces, models, similarities, utils  # noqa:F401
  File "/usr/local/lib/python3.10/dist-packages/gensim/corpora/__init__.py", line 6, in <module>
    from .indexedcorpus import IndexedCorpus  # noqa:F401 must appear before the other classes
  File "/usr/local/lib/python3.10/dist-packages/gensim/corpora/indexedcorpus.py", line 14, in <module>
    from gensim import interfaces, utils
  File "/usr/local/lib/python3.10/dist-packages/gensim/interfaces.py", line 19, in <module>
    from gensim import utils, matutils
  File "/usr/local/lib/python3.10/dist-packages/gensim/matutils.py", line 1031, in <module>
    from gensim._matutils import logsumexp, mean_absolute_difference, dirichlet_expectation
  File "gensim/_matutils.pyx", line 1, in init gensim._matutils
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

And this can be fixed by executing this after installing tltk: pip install --upgrade numpy

Resulting in: 87 tests passed, 0 tests failed.

Maybe an info for INSTALL.md?

giggls commented 1 year ago

I do not hope that this bug will persist. I will reopen this issue until the tltk package has been fixed.

giggls commented 1 year ago

Added a comment about python libraries