giggls / osml10n

Localization functions for Openstreetmap
https://tile.openstreetmap.de
GNU General Public License v3.0
15 stars 7 forks source link

Some tests fail: 85 tests passed, 2 tests failed #22

Closed hafu closed 1 year ago

hafu commented 1 year ago

I fired up a fresh Debian VM to investigate in failing tests:

…
calling osml10n.geo_transcript("42", "thai ถนนข้าวสาร 100", { 100, 14, 101, 15 }):
[ERROR] (expected thai thanon khaosan 100, got thai thanon khaosara 100)
calling osml10n.geo_transcript("42", "อนุสาวรีย์พระยารัษฎาณุประดิษฐ์", { 100, 14, 101, 15 }):
[ERROR] (expected anusawari phraya ratsa da nu pradit, got anusawari phraya rat da nu pradit)
…
85 tests passed, 2 tests failed.
make: *** [Makefile:31: test] Error 1

I tried older versions of python-pinyin-jyutping-sentence, since it outputs trackbacks (see below) from version 1.3. With older versions of python-pinyin-jyutping-sentence the same tests fail. I think the trackback pointed me in the wrong direction.

Any ideas?

The translation service fires this trackback (repeating 4 times) with python-pinyin-jyutping-sentence==1.3:

Feb 14 12:44:53 debian11 systemd[1]: Starting OSM l10n transcription server...
Feb 14 12:44:57 debian11 systemd[1]: Started OSM l10n transcription server.
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]: Loading osml10n transcription server: --- Logging error ---
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]: Traceback (most recent call last):
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/lib/python3.9/logging/__init__.py", line 1082, in emit
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     stream.write(msg + self.terminator)
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]: ValueError: I/O operation on closed file.
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]: Call stack:
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/bin/geo-transcript-srv.py", line 275, in <module>
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     asyncio.run(main())
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     return loop.run_until_complete(main)
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/lib/python3.9/asyncio/base_events.py", line 629, in run_until_complete
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     self.run_forever()
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/lib/python3.9/asyncio/base_events.py", line 596, in run_forever
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     self._run_once()
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/lib/python3.9/asyncio/base_events.py", line 1890, in _run_once
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     handle._run()
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/lib/python3.9/asyncio/events.py", line 80, in _run
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     self._context.run(self._callback, *self._args)
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/bin/geo-transcript-srv.py", line 248, in handle_connection
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     reply = tc.transcript(id,cc,name)
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/bin/geo-transcript-srv.py", line 149, in transcript
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     return(cantonese_transcript(unistr))
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/bin/geo-transcript-srv.py", line 93, in cantonese_transcript
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     transcript=pinyin_jyutping_sentence.jyutping(st, spaces=True)
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/local/lib/python3.9/dist-packages/pinyin_jyutping_sentence/__init__.py", line 390, in process_sentence_jyutping
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     return self.process_sentence(sentence, self.conversion_data.jyutping_word_map, self.conversion_data.jyutping_char_map, self.decode_jyutping, tone_numbers, spaces, remove_tones)
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/local/lib/python3.9/dist-packages/pinyin_jyutping_sentence/__init__.py", line 379, in process_sentence
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     word_list = list(seg_list)
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/local/lib/python3.9/dist-packages/jieba/__init__.py", line 325, in cut
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     for word in cut_block(blk):
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/local/lib/python3.9/dist-packages/jieba/__init__.py", line 250, in __cut_DAG
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     DAG = self.get_DAG(sentence)
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/local/lib/python3.9/dist-packages/jieba/__init__.py", line 181, in get_DAG
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     self.check_initialized()
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/local/lib/python3.9/dist-packages/jieba/__init__.py", line 170, in check_initialized
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     self.initialize()
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:   File "/usr/local/lib/python3.9/dist-packages/jieba/__init__.py", line 113, in initialize
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]:     default_logger.debug("Building prefix dict from %s ..." % (abs_path or 'the default dictionary'))
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]: Message: 'Building prefix dict from the default dictionary ...'
Feb 14 12:45:18 debian11 geo-transcript-srv.py[11765]: Arguments: ()

My test setup:

root@debian11:~# apt-get update && apt-get -y upgrade
root@debian11:~# apt-get install -y dpkg-dev debhelper git libunac1-dev \
    luarocks lua5.3 libpcre3-dev liblua5.3-dev python3-icu python3-shapely \
    python3-pip python3-sdnotify python3-requests
root@debian11:~# luarocks install lrexlib-pcre
root@debian11:~# pip install pykakasi tltk pinyin_jyutping_sentence scikit-learn
…
Successfully built pinyin-jyutping-sentence tltk jaconv jieba fst-pso miniful sklearn
Installing collected packages: numpy, scipy, miniful, simpful, pytz, python-dateutil, fst-pso, threadpoolctl, pyfume, pandas, joblib, wrapt, tqdm, tabulate, smart-open, scikit-learn, regex, python-crfsuite, FuzzyTM, click, sklearn-crfsuite, sklearn, nltk, jieba, jaconv, gensim, deprecated, tltk, pykakasi, pinyin-jyutping-sentence
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.5
    Not uninstalling numpy at /usr/lib/python3/dist-packages, outside environment /usr
    Can't uninstall 'numpy'. No files were found to uninstall.
Successfully installed FuzzyTM-2.0.5 click-8.1.3 deprecated-1.2.13 fst-pso-1.8.1 gensim-4.3.0 jaconv-0.3.3 jieba-0.42.1 joblib-1.2.0 miniful-0.0.6 nltk-3.8.1 numpy-1.24.2 pandas-1.5.3 pinyin-jyutping-sentence-1.3 pyfume-0.2.25 pykakasi-2.2.1 python-crfsuite-0.9.9 python-dateutil-2.8.2 pytz-2022.7.1 regex-2022.10.31 scikit-learn-1.2.1 scipy-1.10.0 simpful-2.9.0 sklearn-0.0 sklearn-crfsuite-0.3.6 smart-open-6.3.0 tabulate-0.9.0 threadpoolctl-3.1.0 tltk-1.6.3 tqdm-4.64.1 wrapt-1.14.1
user@debian11:~$ git clone https://github.com/giggls/osml10n.git
user@debian11:~$ cd osml10n/lua_unac
user@debian11:~/osml10n/lua_unac$ make deb
user@debian11:~/osml10n/lua_unac$ cd ..
user@debian11:~/osml10n$ make deb
root@debian11:~# dpkg -i /home/user/osml10n/lua-unaccent_1.8-1_amd64.deb \
    /home/user/osml10n_1.0_all.deb
root@debian11:~# systemctl status osml10n.service
● osml10n.service - OSM l10n transcription server
     Loaded: loaded (/lib/systemd/system/osml10n.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2023-02-14 12:15:04 CET; 23s ago
   Main PID: 11240 (geo-transcript-)
      Tasks: 4 (limit: 2337)
     Memory: 530.6M
        CPU: 4.197s
     CGroup: /system.slice/osml10n.service
             └─11240 /usr/bin/python3 /usr/bin/geo-transcript-srv.py -s -g /usr/share/osml10n/boundaries

Feb 14 12:15:00 debian11 systemd[1]: Starting OSM l10n transcription server...
Feb 14 12:15:04 debian11 systemd[1]: Started OSM l10n transcription server.
giggls commented 1 year ago

As Thai language transcription is not an exact science I would not consider these tests as failing as the output is similar enough to the expected one.

Looks like some newer version of the tltk library will output something slightly different.

I will change the installation instructions for tltk to point to version 1.6.x of of this library and fix the tests to match the newer output.

hafu commented 1 year ago

Thank you. Fyi: it changed from tltk 1.6.2 to 1.6.3. Next time I can provide a PR.

giggls commented 1 year ago

This is not the first issue with tltk. The last one was a missing dependency (Issue #19).