abdeladim-s / pywhispercpp

Python bindings for whisper.cpp
https://abdeladim-s.github.io/pywhispercpp/
MIT License
180 stars 26 forks source link

Use repairwheel to fix the .dylib linkage. #75

Closed PiotrCzapla closed 1 month ago

PiotrCzapla commented 1 month ago

I'm still unsure if this is the best approach but I wanted pip install . to work, normally there is an option in cibuildwheel that let us run repairwheel, but this leaves the pip install broken.

It is quite difficult to find information how to do it properly, so I'm still unsure that it is right solution as we are still ending up with libs in two places in the wheel, but at least they reference them selfs correctly so which ever is picked first the rest is loaded.

I've tested this only on my mac so I'm not sure that it is working correctly in windows and linux but it should as repairwheel is multiplatform.

After repair the libs are in pywhispercpp-1.2.0/.dylib and in pywhispercpp-1.2.0/pywhispercpp/.dylibs/.dylib, but they reference each other have a look at the output of otool -L

otool -L pywhispercpp-1.2.0/libwhisper.1.7.1.dylib
pywhispercpp-1.2.0/libwhisper.1.7.1.dylib:
        @rpath/libwhisper.1.dylib (compatibility version 1.0.0, current version 1.7.1)
        @loader_path/pywhispercpp/.dylibs/libggml.dylib (compatibility version 0.0.0, current version 0.0.0)
        @loader_path/pywhispercpp/.dylibs/libwhisper.coreml.dylib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1800.101.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1351.0.0)
otool -L pywhispercpp-1.2.0/pywhispercpp/.dylibs/libwhisper.1.7.1.dylib
pywhispercpp-1.2.0/pywhispercpp/.dylibs/libwhisper.1.7.1.dylib:
        /DLC/pywhispercpp/.dylibs/libwhisper.1.7.1.dylib (compatibility version 1.0.0, current version 1.7.1)
        @loader_path/libggml.dylib (compatibility version 0.0.0, current version 0.0.0)
        @loader_path/libwhisper.coreml.dylib (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1800.101.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1351.0.0)

This pr is based on #74, as I'm assuming it won't be merged before it, as it needs test. Only the last commit matters. I've tested this on macos, I'm not sure if it will work on other platforms though. Could you check on yours?

PiotrCzapla commented 1 month ago

I've asked guys at repairwheel for advice if this approach is ok or not. https://github.com/jvolkman/repairwheel/issues/42

PiotrCzapla commented 1 month ago

Windows build is complaining about wrong version of runtime libraries but I hope it is only an issue if we would redistribute the wheels.

fixed wheel written to C:\Users\RUNNER~1\AppData\Local\Temp\repairwheelhb83b115\pywhispercpp-1.2.0-cp38-cp38-win_amd64.whl
  C:\Users\runneradmin\AppData\Local\Temp\pip-build-env-72a_zsse\overlay\Lib\site-packages\delvewheel\_dll_utils.py:460: UserWarning: _pywhispercpp.cp38-win_amd64.pyd was built with a newer platform toolset (14.41) than the discovered msvcp140.dll (14.15). This may cause compatibility issues.
    warnings.warn(f'{os.path.basename(lib_path)} was built with a newer platform toolset ({linker_version}) than the discovered {os.path.basename(dll_info[0])} ({vc_redist_linker_version}). This may cause compatibility issues.')
  C:\Users\runneradmin\AppData\Local\Temp\pip-build-env-72a_zsse\overlay\Lib\site-packages\delvewheel\_dll_utils.py:460: UserWarning: _pywhispercpp.cp38-win_amd64.pyd was built with a newer platform toolset (14.41) than the discovered vcruntime140_1.dll (14.28). This may cause compatibility issues.
    warnings.warn(f'{os.path.basename(lib_path)} was built with a newer platform toolset ({linker_version}) than the discovered {os.path.basename(dll_info[0])} ({vc_redist_linker_version}). This may cause compatibility issues.')
  C:\Users\runneradmin\AppData\Local\Temp\pip-build-env-72a_zsse\overlay\Lib\site-packages\delvewheel\_dll_utils.py:460: UserWarning: whisper.dll was built with a newer platform toolset (14.41) than the discovered msvcp140.dll (14.15). This may cause compatibility issues.
    warnings.warn(f'{os.path.basename(lib_path)} was built with a newer platform toolset ({linker_version}) than the discovered {os.path.basename(dll_info[0])} ({vc_redist_linker_version}). This may cause compatibility issues.')
  C:\Users\runneradmin\AppData\Local\Temp\pip-build-env-72a_zsse\overlay\Lib\site-packages\delvewheel\_dll_utils.py:460: UserWarning: whisper.dll was built with a newer platform toolset (14.41) than the discovered vcruntime140_1.dll (14.28). This may cause compatibility issues.
    warnings.warn(f'{os.path.basename(lib_path)} was built with a newer platform toolset ({linker_version}) than the discovered {os.path.basename(dll_info[0])} ({vc_redist_linker_version}). This may cause compatibility issues.')
  C:\Users\runneradmin\AppData\Local\Temp\pip-build-env-72a_zsse\overlay\Lib\site-packages\delvewheel\_dll_utils.py:460: UserWarning: ggml.dll was built with a newer platform toolset (14.41) than the discovered msvcp140.dll (14.15). This may cause compatibility issues.
    warnings.warn(f'{os.path.basename(lib_path)} was built with a newer platform toolset ({linker_version}) than the discovered {os.path.basename(dll_info[0])} ({vc_redist_linker_version}). This may cause compatibility issues.')
  C:\Users\runneradmin\AppData\Local\Temp\pip-build-env-72a_zsse\overlay\Lib\site-packages\delvewheel\_dll_utils.py:[460](https://github.com/abdeladim-s/pywhispercpp/actions/runs/11386200589/job/31677837180?pr=75#step:6:461): UserWarning: ggml.dll was built with a newer platform toolset (14.41) than the discovered vcruntime140_1.dll (14.28). This may cause compatibility issues.
    warnings.warn(f'{os.path.basename(lib_path)} was built with a newer platform toolset ({linker_version}) than the discovered {os.path.basename(dll_info[0])} ({vc_redist_linker_version}). This may cause compatibility issues.')
  Wrote C:\Users\runneradmin\AppData\Local\Temp\repaired_wheel_h7c34kmw\pywhispercpp-1.2.0-cp38-cp38-win_amd64.whl

Can you check if the test are working on windows?

abdeladim-s commented 1 month ago

Thanks @PiotrCzapla for the time you put into this. It took me a substantial amount of time as well to find a solution to this issue, but unfortunately I couldn't find any documentation how to do it properly, also as you noticed I use cbuildhweel to repair the wheel in CI but didn't work unfortunately. I decided at the end to load the libs manually as many Python packages are doing already.

I don't have Windows to give this a test but I will try to test on my Linux machine, even though I can see that CI failed on Linux.

But I have a question first, what's the issue with loading the libs manually during the import ? Did you find any issues with that ? pip install . should work because the libs are moved to the libs folder and are loaded during runtime, even if you delete the build folder it should still find the libs!

PiotrCzapla commented 1 month ago

Hey @abdeladim-s! Thanks for running the build on cbuildwheel. It looks like there’s a little hiccup with repairwheel since it’s using the wrong platform value for auditwheel. We can fix this in cibuild by providing a custom command for this specific platform. Do you have a way to run the CI locally? I’d love to experiment with it.

This isn’t the first time I’ve run into rpath issues on Mac, and this time I decided to dig deeper and tackle it in a more standard way instead of just adding paths to DYLD_LIBRARY_PATH. It’s been quite a journey, but I hope this proper fix will help others facing similar problems down the line. While the approach might not necessarily be better, manually loading libraries can make the code fragile (like when whisper.cpp adds another library, such as coreml), and it can be confusing for those who aren’t library maintainers but understand how dynamic libraries work.

I genuinely hope that once we properly test this pr and resolve the issues, everything will run smoothly without needing to revisit it in the future.

To give you an idea of what it’s like trying to build the library for the first time:

In the PR, I’m trying to make it either work or break with clear compilation failure like you have on CI in Linux, which should be easier to fix with a quick Google search.

After you decide on whether to repair the wheels, I’d like to add a couple of things:

PiotrCzapla commented 1 month ago

It looks like the build is failing because the CI is attempting to create a wheel for the musllinux platform, like Alpine, while using Ubuntu, which only supports glibc builds, specifically manylinux platforms.

Have a look at this message: --plat has an invalid choice: 'musllinux_1_1_x86_64' from the environment variable 'AUDITWHEEL_PLAT'. The valid options are 'linux_x86_64', 'manylinux_2_5_x86_64', 'manylinux_2_12_x86_64', 'manylinux_2_17_x86_64', 'manylinux_2_24_x86_64', 'manylinux_2_27_x86_64', 'manylinux_2_28_x86_64', 'manylinux_2_31_x86_64', 'manylinux_2_34_x86_64', 'manylinux_2_35_x86_64', 'manylinux1_x86_64', 'manylinux2010_x86_64', and 'manylinux2014_x86_64'.

Auditwheel only supports musllinux when it is run on alpine linux have a look on the output of this command on python:3.11-alpine on mac M1 (aarch64): $ AUDITWHEEL_PLAT=trigger_error auditwheel repair something ... argparse.ArgumentError: argument --plat: invalid choice: 'trigger_error' from environment variable 'AUDITWHEEL_PLAT' (choose from 'linux_aarch64', 'musllinux_1_1_aarch64')

It appears that pywhisper didn't work on Alpine Linux before, even though it seemed like it did. Releaed version of pywhispercpp-1.2.0 doesn't support Alpine Linux as attempting to install it there leads to a build and then an error. The build fails because whisper.cpp source code are not attached.

Here's how it looks:

  ✗ docker run -it -v "$(pwd)":/project -w /project python:3.11-alpine /bin/sh                                   
/project # apk add --no-cache gcc g++ cmake make ninja
/project # pip install pywhispercpp
Collecting pywhispercpp
  Downloading pywhispercpp-1.2.0.tar.gz (234 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 234.3/234.3 kB 4.6 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting numpy (from pywhispercpp)
  Downloading numpy-2.1.2-cp311-cp311-musllinux_1_2_aarch64.whl.metadata (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.0/62.0 kB 8.8 MB/s eta 0:00:00
Collecting pydub (from pywhispercpp)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting requests (from pywhispercpp)
  Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting tqdm (from pywhispercpp)
  Downloading tqdm-4.66.5-py3-none-any.whl.metadata (57 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.6/57.6 kB 18.8 MB/s eta 0:00:00
Collecting platformdirs (from pywhispercpp)
  Downloading platformdirs-4.3.6-py3-none-any.whl.metadata (11 kB)
Collecting charset-normalizer<4,>=2 (from requests->pywhispercpp)
  Downloading charset_normalizer-3.4.0-cp311-cp311-musllinux_1_2_aarch64.whl.metadata (34 kB)
Collecting idna<4,>=2.5 (from requests->pywhispercpp)
  Downloading idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting urllib3<3,>=1.21.1 (from requests->pywhispercpp)
  Downloading urllib3-2.2.3-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests->pywhispercpp)
  Downloading certifi-2024.8.30-py3-none-any.whl.metadata (2.2 kB)
Downloading numpy-2.1.2-cp311-cp311-musllinux_1_2_aarch64.whl (14.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.4/14.4 MB 58.7 MB/s eta 0:00:00
Downloading platformdirs-4.3.6-py3-none-any.whl (18 kB)
Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Downloading requests-2.32.3-py3-none-any.whl (64 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB 20.3 MB/s eta 0:00:00
Downloading tqdm-4.66.5-py3-none-any.whl (78 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.4/78.4 kB 25.2 MB/s eta 0:00:00
Downloading certifi-2024.8.30-py3-none-any.whl (167 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 167.3/167.3 kB 44.8 MB/s eta 0:00:00
Downloading charset_normalizer-3.4.0-cp311-cp311-musllinux_1_2_aarch64.whl (139 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 139.1/139.1 kB 37.4 MB/s eta 0:00:00
Downloading idna-3.10-py3-none-any.whl (70 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 70.4/70.4 kB 22.0 MB/s eta 0:00:00
Downloading urllib3-2.2.3-py3-none-any.whl (126 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 126.3/126.3 kB 39.2 MB/s eta 0:00:00
Building wheels for collected packages: pywhispercpp
  Building wheel for pywhispercpp (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for pywhispercpp (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [97 lines of output]
      running bdist_wheel

it fails because it can't find ggml.h in the wheel source file.

Can we instruct the cibuild to skip musl platform all together for the time being? It will speed up CI. Building on alpine fails due to some other issues, but whisper.cpp compiles fine though.

PiotrCzapla commented 1 month ago

I managed to build it on alpine, the issue was lack of build tools, or mess in the build directory from the previous builds (i'm sharing the same volume between dockers).

It builds on python:3.11-alpine, but fails on quay.io/pypa/musllinux_1_1_aarch64:2024 :(.

PiotrCzapla commented 1 month ago

I’ve upgraded to the latest cibuild, and I hope that it will fix the issue since the newest cibuildwheel is now targeting musllinux 1_2 instead of the outdated 1_1. I also updated wheel.yml to use the latest version of the artifact commands, as the old ones will soon be phased out.

PiotrCzapla commented 1 month ago

I've run it on my CI and found the same issue,but I had a closer look what docker images are being used and how to run them and I managed to isolate the issue.

Repairwheel ships with it's own version of auditwheel, that have hardcoded values for the AUDITWHEEL_PLAT variable, hence it fails on musllinux. I've reported that back to the repairwheel https://github.com/jvolkman/repairwheel/issues/43

I think the way forward is to run auditool directly on muslinux via cibuildwheel, and keep using repairwheel only during development when pip is being run without cibuildwheel.

PiotrCzapla commented 1 month ago

It is working now, the cibuildwheel is taking care of repair step, for linux and macos. Wheelrepair is fixing dependencies on windows, and during regular build. I've learned the hard way that you can't repair wheel twice on linux as auditwheel repair changes the names of the libs.

I've found out how to get in to the failing build of cibuildwheel while it is running. The easy way is to add sleep to CIBW_BEFORE_BUILD, and then exec bash in to the container to inspect and run the build manually.

The test on linux are passing after cibuildwheel.

I haven't changed the gh actions/upload-artifact to version 4 as it fails complaining that we overwrite existing files during upload. So maybe it would make sense to add some code to change the version from 1.2 to a 1.3- ?

Let me know what you think. I will try this on windows later on.

abdeladim-s commented 1 month ago

@PiotrCzapla, I agree .. If we can make this work seamlessly across all platforms, it would be great. However, it's quite challenging to find the proper way to do it cross-platform.


I think we need to add a step to run the test suite after the build to ensure the builds are working properly.


And of course, once we finish this, feel free to submit any other PRs. Your contributions are always welcome :)

abdeladim-s commented 1 month ago

I've added tests to the workflow after the build, Please check out this branch.

the macos and linux builds seem great, though it still failing locally on my linux machine (My bad, I just noticed that the wheel for linux_x86_64 was actually generated, ) So maybe something wrong with my installation.

The only issue now is windows wheels! Take a look at the workflow results.

abdeladim-s commented 1 month ago

The Windows wheels were failing because of the manual loading mechanism. I deleted it and all tests are working now. Nevertheless, If you have access to a Windows machine please give it a try as well, If everything goes well I'll go ahead and merge this one.

Really brilliant contribution @PiotrCzapla :rocket:

PiotrCzapla commented 1 month ago

Hi thank you for adding the tests and fixing the remaining issue, I have an app that uses pywhispercpp and should work on windows, I wasn't able to do so last week but I'm going to give it a try soon and report back. Thank you for providing the build files.

Btw. What would you say to bumping the version? If we include the commit check sum we will be able to change upload action to v4.

abdeladim-s commented 1 month ago

I think all builds are good .. I'll just go ahead and merge this one. I have one final question though, do you have any idea why the shared library names are good on macos wheels but not on Linux and windows ?


Btw. What would you say to bumping the version? If we include the commit check sum we will be able to change upload action to v4.

Are you talking about bumping the version of the upload Github action, right ? I don't think it has any value now since everything is working as expected, do you think it's worth it ?

PiotrCzapla commented 1 month ago

@abdeladim-s thank you for merging. It is absolute hell using windows and python, but I managed to get it working on windows, after pip install . both test run fine, and I'm able to use my script too.

one remaining issue on windows is that it does not build from source when using pip install 'git+http ...' see: #78

I have one final question though, do you have any idea why the shared library names are good on macos wheels but not on Linux and windows ?

The values at the end of libs names looks like checksums, so I guess it is to ensure that a correct lib is being loaded, each time and not a lib from linux env, I wonder though why mac os does not have it though.

But I'm pretty sure it isn't an error as Torch has the same:

unzip -t torch-2.5.0-cp312-cp312-manylinux2014_aarch64.whl |grep -v -E '\.py|\.h|\.cmake|\.cpp\|.cuh' | grep -v '/  '
Archive:  torch-2.5.0-cp312-cp312-manylinux2014_aarch64.whl
    testing: torch.libs/libgfortran-daac5196.so.5.0.0   OK
    testing: torch.libs/libopenblasp-r0-2e4c6a40.3.25.so   OK
    testing: torch.libs/libgomp-ee2ec4d2.so.1.0.0   OK
    testing: torch.libs/libarm_compute_graph-742b7619.so   OK
    testing: torch.libs/libarm_compute-32c725fd.so   OK
    testing: torch-2.5.0.dist-info/METADATA   OK
PiotrCzapla commented 1 month ago

Are you talking about bumping the version of the upload Github action, right ? I don't think it has any value now since everything is working as expected, do you think it's worth it ?

I was referring to both: if we upgrade the version of pywhisercpp and include the commit ID, we can raise the update-artifacts to v4. This version prevents file overwriting during the build process, likely across different builds, so having different wheels names each build should solve the issue.

abdeladim-s commented 1 month ago

@abdeladim-s thank you for merging. It is absolute hell using windows and python, but I managed to get it working on windows, after pip install . both test run fine, and I'm able to use my script too. one remaining issue on windows is that it does not build from source when using pip install 'git+http ...' see: #78

Yes, windows is absolute hell, especially if you wanted to build a python C extension .. glad you made it through and it's working :) Thanks!

I have one final question though, do you have any idea why the shared library names are good on macos wheels but not on Linux and windows ?

The values at the end of libs names looks like checksums, so I guess it is to ensure that a correct lib is being loaded, each time and not a lib from linux env, I wonder though why mac os does not have it though.

But I'm pretty sure it isn't an error as Torch has the same:

unzip -t torch-2.5.0-cp312-cp312-manylinux2014_aarch64.whl |grep -v -E '\.py|\.h|\.cmake|\.cpp\|.cuh' | grep -v '/  '
Archive:  torch-2.5.0-cp312-cp312-manylinux2014_aarch64.whl
    testing: torch.libs/libgfortran-daac5196.so.5.0.0   OK
    testing: torch.libs/libopenblasp-r0-2e4c6a40.3.25.so   OK
    testing: torch.libs/libgomp-ee2ec4d2.so.1.0.0   OK
    testing: torch.libs/libarm_compute_graph-742b7619.so   OK
    testing: torch.libs/libarm_compute-32c725fd.so   OK
    testing: torch-2.5.0.dist-info/METADATA   OK

Oh, so it's the checksum. That makes sense. Thanks for the clarification.

abdeladim-s commented 1 month ago

Are you talking about bumping the version of the upload Github action, right ? I don't think it has any value now since everything is working as expected, do you think it's worth it ?

I was referring to both: if we upgrade the version of pywhisercpp and include the commit ID, we can raise the update-artifacts to v4. This version prevents file overwriting during the build process, likely across different builds, so having different wheels names each build should solve the issue.

Yes, for pywhisercpp I was planning to bump a new version and push it to PYPI, especially the wheels are working now. I am just still trying to build wheels for other architectures like arm and aarch64, the build process takes so long and always fail with weird errors.