Helsinki-NLP / opus-fast-mosestokenizer

c++ mosestokenizer (OPUS fork)
GNU Lesser General Public License v2.1
3 stars 5 forks source link

Installation woes #6

Closed thfrkielikone closed 1 month ago

thfrkielikone commented 2 months ago

If I run the following to try to install opus-fast-mosestokenizer, I get various errors.

FROM fedora:40
RUN python3 -m ensurepip
RUN dnf install cmake g++ make boost-devel re2-devel python3-devel glib2-devel -y
RUN pip3 install opus-fast-mosestokenizer

The pybind11 python package doesn't get correctly pulled in. I get this error:

Collecting opus-fast-mosestokenizer
  Downloading opus-fast-mosestokenizer-0.0.8.5.tar.gz (86 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.8/86.8 kB 345.3 kB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: opus-fast-mosestokenizer
  Building wheel for opus-fast-mosestokenizer (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for opus-fast-mosestokenizer (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [140 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-312
      creating build/lib.linux-x86_64-cpython-312/mosestokenizer
      copying bindings/python/mosestokenizer/__init__.py -> build/lib.linux-x86_64-cpython-312/mosestokenizer
      creating build/lib.linux-x86_64-cpython-312/mosestokenizer/share
      copying bindings/python/mosestokenizer/share/README.txt -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share
      creating build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.ro -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.gu -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.ta -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.it -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.mni -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.ru -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.ca -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.hu -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.ga -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.ml -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.hi -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.pa -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.fi -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.or -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.kn -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.el -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.te -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.nl -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.as -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.fr -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.yue -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.en -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.sk -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.sl -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.pt -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.lt -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.mr -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/README.txt -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.cs -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.et -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.de -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.sv -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.pl -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.is -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.bn -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.lv -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.es -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      copying bindings/python/mosestokenizer/share/nonbreaking_prefixes/nonbreaking_prefix.zh -> build/lib.linux-x86_64-cpython-312/mosestokenizer/share/nonbreaking_prefixes
      running build_ext
      CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
        Compatibility with CMake < 3.5 will be removed from a future version of
        CMake.

        Update the VERSION argument <min> value or use a ...<max> suffix to tell
        CMake that the project does not need compatibility with older versions.

      -- The CXX compiler identification is GNU 14.1.1
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found Boost: /usr/lib64/cmake/Boost-1.83.0/BoostConfig.cmake (found version "1.83.0") found components: program_options thread
      -- Found Intl: built in to C library
      -- Found PkgConfig: /usr/bin/pkg-config (found version "2.1.0")
      -- Checking for one of the modules 'glib-2.0'
      -- Checking for one of the modules 're2'
      CMake Error at CMakeLists.txt:154 (find_package):
        By not providing "Findpybind11.cmake" in CMAKE_MODULE_PATH this project has
        asked CMake to find a package configuration file provided by "pybind11",
        but CMake did not find one.

        Could not find a package configuration file provided by "pybind11" with any
        of the following names:

          pybind11Config.cmake
          pybind11-config.cmake

        Add the installation prefix of "pybind11" to CMAKE_PREFIX_PATH or set
        "pybind11_DIR" to a directory containing one of the above files.  If
        "pybind11" provides a separate development package or SDK, be sure it has
        been installed.

      -- Configuring incomplete, errors occurred!
      Traceback (most recent call last):
        File "/usr/local/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/usr/local/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 410, in build_wheel
          return self._build_with_temp_dir(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 395, in _build_with_temp_dir
          self.run_setup()
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 487, in run_setup
          super().run_setup(setup_script=setup_script)
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 107, in <module>
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/__init__.py", line 103, in setup
          return distutils.core.setup(**attrs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 184, in setup
          return run_commands(dist)
                 ^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/dist.py", line 968, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/wheel/bdist_wheel.py", line 368, in run
          self.run_command("build")
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/dist.py", line 968, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/_distutils/command/build.py", line 132, in run
          self.run_command(cmd_name)
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/dist.py", line 968, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-g_nm51o0/overlay/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "<string>", line 53, in run
        File "<string>", line 91, in build_extension
        File "/usr/lib64/python3.12/subprocess.py", line 413, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-install-mtqybpen/opus-fast-mosestokenizer_310af3f88e304136a9af5fca69c0ab3b', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/tmp/pip-install-mtqybpen/opus-fast-mosestokenizer_310af3f88e304136a9af5fca69c0ab3b/build/lib.linux-x86_64-cpython-312/mosestokenizer/lib/', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-DBUILD_CLI:BOOL=OFF', '-DBUILD_PYTHON:BOOL=ON', '-DBUILD_SHARED_LIBS:BOOL=ON', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for opus-fast-mosestokenizer
Failed to build opus-fast-mosestokenizer
ERROR: Could not build wheels for opus-fast-mosestokenizer, which is required to install pyproject.toml-based projects

[notice] A new release of pip is available: 23.3.2 -> 24.0
[notice] To update, run: pip install --upgrade pip

After installing pybind with pip3 install pybind11, I still get the same error. I can get around it by specifying where cmake should look for pybind, but it feels really hacky:

CMAKE_PREFIX_PATH=/usr/local/lib/python3.12/site-packages/pybind11/share/cmake/pybind11/ pip3 install opus-fast-mosestokenizer

At this point, opus-fast-mosestokenizer installs. Running it is still problematic:

python3 -c 'import mosestokenizer'
Traceback (most recent call last):
  File "/usr/local/lib64/python3.12/site-packages/mosestokenizer/__init__.py", line 16, in <module>
    from mosestokenizer.lib import _mosestokenizer
ImportError: libmosestokenizer-dev.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib64/python3.12/site-packages/mosestokenizer/__init__.py", line 20, in <module>
    raise RuntimeError(_msg)
RuntimeError: Failed to import mosestokenizer c++ library
Full error log: ImportError('libmosestokenizer-dev.so: cannot open shared object file: No such file or directory')

If I add it to LD_LIBRARY_PATH, it runs:

[root@57a03ed18e28 /]# LD_LIBRARY_PATH=/usr/local/lib64/python3.12/site-packages/mosestokenizer/lib python3 -c 'import mosestokenizer'
[root@57a03ed18e28 /]#

My issue here is then:

And in a wider sense: would it be possible to have a pre-built wheel for this package to make installation even easier?

svirpioj commented 2 months ago

This does not look straightforward - possible the issue with M-series Macs is similar? I tried for some time to get something working for that, but did not succeed. So any help is gladly taken.

And in a wider sense: would it be possible to have a pre-built wheel for this package to make installation even easier?

There is a number of wheels available at PyPI, but I guess they are not working in your platform then?

svirpioj commented 2 months ago

at import time opus-fast-mosestokenizer fails to find its dll / so files. This may be a failure caused by my hacky way of compiling the package.

A quick comment on this: I found that the problem seems to be in all systems when the package is installed from source (pip install --no-binary :all: opus-fast-mosestokenizer). When installed from a wheel, the library is found.

I couldn't so far find a solution for this (other than setting LD_LIBRARY_PATH).

thfrkielikone commented 2 months ago

My platform is a mac, but it's running an x86_64 fedora image and the hardware isn't arm-based either: I don't think there not being a wheel is related to that. The version of Python it is running is 3.12, for which there is no wheel (at a glance): is that the reason it decides to build from source? The error does look very similar to the one in the issue you linked.

svirpioj commented 2 months ago

The version of Python it is running is 3.12, for which there is no wheel (at a glance): is that the reason it decides to build from source?

Yes, there was indeed no Python 3.12 wheels before, which forced installing from source. I added them now to a new version 0.0.8.6. I think it's working now?

thfrkielikone commented 2 months ago

This looks like its working now, thank you very much for your time.

docker run -it fedora:40
[root@57390d4cfe46 /]# python3 -m ensurepip
Looking in links: /tmp/tmpw85dicm7
Processing /tmp/tmpw85dicm7/pip-23.3.2-py3-none-any.whl
Installing collected packages: pip
Successfully installed pip-23.3.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[root@57390d4cfe46 /]# pip3 install opus-fast-mosestokenizer
Collecting opus-fast-mosestokenizer
  Downloading opus_fast_mosestokenizer-0.0.8.6-cp312-cp312-manylinux1_x86_64.whl.metadata (3.6 kB)
Downloading opus_fast_mosestokenizer-0.0.8.6-cp312-cp312-manylinux1_x86_64.whl (838 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 838.3/838.3 kB 2.2 MB/s eta 0:00:00
Installing collected packages: opus-fast-mosestokenizer
Successfully installed opus-fast-mosestokenizer-0.0.8.6
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 23.3.2 -> 24.1.1
[notice] To update, run: python3 -m pip install --upgrade pip
[root@57390d4cfe46 /]# python3
Python 3.12.2 (main, Feb 21 2024, 00:00:00) [GCC 14.0.1 20240217 (Red Hat 14.0.1-0)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mosestokenizer
>>> m = mosestokenizer.MosesTokenizer()  
>>> m.tokenize("Kissa kävelee puussa")
['Kissa', 'kävelee', 'puussa']
>>>

(Building from source is still similarly broken: no surprises there.)

svirpioj commented 1 month ago

I found a hack to make also the source installs work. As far as I was able to understand, the problem was that source install uses shared libraries (which makes sense), and then the main lib (libmosestokenizer-dev.so) wasn't found by the python interface lib (_mosestokenizer.<xyz>.so) unless it was in LD_LIBRARY_PATH. Loading the main library with ctypes.cdll.LoadLibrary beforehand did the trick. Closing this now.