coqui-ai / STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
https://coqui.ai
Mozilla Public License 2.0
2.23k stars 270 forks source link

Bug: stt complains libbz2.so.1.0 not found #2341

Open alanorth opened 1 year ago

alanorth commented 1 year ago

Describe the bug Python stt 1.4.0 from pip crashes due to missing libbz2.so.1.0 on CentOS Stream 8 and probably all RPM-based distros:

$ stt
Traceback (most recent call last):
  File "/tmp/venv-test/bin/stt", line 5, in <module>
    from stt.client import main
  File "/tmp/venv-test/lib/python3.9/site-packages/stt/__init__.py", line 23, in <module>
    from stt.impl import Version as version
  File "/tmp/venv-test/lib/python3.9/site-packages/stt/impl.py", line 13, in <module>
    from . import _impl
ImportError: libbz2.so.1.0: cannot open shared object file: No such file or directory

But bzip2 libs are installed:

$ dnf list installed bzip2\*
Installed Packages
bzip2.x86_64                                         1.0.6-26.el8                         @baseos
bzip2-devel.x86_64                                   1.0.6-26.el8                         @baseos
bzip2-libs.x86_64                                    1.0.6-26.el8                         @baseos

And libbz2.so is available, albeit as libbz2.so.1:

$ ldconfig -p | grep libbz2
        libbz2.so.1 (libc6,x86-64) => /lib64/libbz2.so.1
        libbz2.so.1 (libc6) => /lib/libbz2.so.1
        libbz2.so (libc6,x86-64) => /lib64/libbz2.so

To Reproduce Steps to reproduce the behavior:

  1. Run the following command pip install stt in a virtual environment
  2. Run the following command stt
  3. See error

Expected behavior stt to not crash.

Environment (please complete the following information):

Additional context

I can make a symlink to fix this, but it feels dirty. I'm wondering which systems does this work on? Could stt check for libbz2.so.1 instead, or after checking for libbz2.so.1.0?

wasertech commented 1 year ago

I can make a symlink to fix this, but it feels dirty. I'm wondering which systems does this work on?

I guess it works too. The ci pipeline uses a debian base (ubuntu) but it works on a lots of distos including arch. I guess your distro has its own way of packaging libs.

Could stt check for libbz2.so.1 instead, or after checking for libbz2.so.1.0

I don't know the details of the implement. but I think it is reasonable to fallback to libbz2.so.1 if libbz2.so.1.0 is not found instead of spiting ImportError("libbz2.so.1.0: cannot open shared object file: No such file or directory")...

For what its worth, I think it's KenLM that uses this lib so it probably means that you have not built and installed kenlm properly in your $PATH according to your disto standards.

alanorth commented 1 year ago

Hi @wasertech I installed stt in a virtual environment from pip so I haven't installed KenLM directly. And I'm on CentOS Stream 8. Here's how I installed stt:

$ python3.9 -V
Python 3.9.14
$ python -m venv /tmp/venv-test
$ source /tmp/venv-test/bin/activate
$ pip install --upgrade pip setuptools wheel
$ pip install stt

I don't fully understand the Python packaging ecosystem, but stt seems to bring its own KenLM via a wheel? I have this in my virtual env:

$ ldd /tmp/venv-test/lib64/python3.9/site-packages/stt/lib/libkenlm.so
        linux-vdso.so.1 (0x00007ffd5551d000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f8427170000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f8426dee000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8426bd6000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f8426811000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f8427505000)

Perhaps this issue belongs on the KenLM issue tracker then, and we can close this one.

wasertech commented 1 year ago

Perhaps this issue belongs on the KenLM issue tracker then, and we can close this one.

it is not a issue. Your are installing a package (STT) that is built using Debian based distro.

But you are not using a debian based system, you should probably build your binaries according to your distribution.

In other words you should be able to install STT on the system with rpm not using pip.

For example, on arch, I wrote a package to build STT the way arch wants it.

https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=stt

It should be the same for CentOS but according to RedHad packaging guidelines.

So when you build KenLM, you can pass the correct links to libs and it should work.

wasertech commented 1 year ago

To be clear, when you install STT from pip, you are just grabbing the pre-built python bindings. These are built on Ubuntu from libkenlm.so and libstt.so. So if one or the other is missing a lib that was present (somewhere in particular) when it was built but is not there anymore (or it has changed location) it will complain this lib is missing.

alanorth commented 1 year ago

@wasertech I use Arch too BTW (and I also maintain packages on the AUR... but that's irrelevant here).

Anyway, what is the point of providing a pip installable STT if it is only expected to work on Debian-based distros? If there's a notice to this effect in the README and I missed it, then my apologies. Otherwise, in scientific computing many institutes use RHEL / CentOS.

wasertech commented 1 year ago

Coqui uses a GitHub ci pipeline to build all prebuilt binaries. Including python bindings. For Linux, it uses a stable release of Ubuntu which is based on stable Debian. It also a pretty standard Linux distribution.

i didn’t choose it, it don’t use Ubuntu and I don’t even work at coqui. I’m just telling you that your distribution has not followed the same rules as Debian when providing libs. On arch we use aliases like you did to loosely follow Debian’s way of sharing libs. On arch you can pip install stt and it works like a charm. In your case, since your distro doesn’t make the effort to make thoses aliases you should make them yourself, or follow their way of shipping software thru rpm and tell KenLM where to finds its libs when you build it.