ahupp / python-magic

A python wrapper for libmagic
Other
2.64k stars 283 forks source link

Discussion: support bundling libmagic #233

Closed pombredanne closed 1 year ago

pombredanne commented 3 years ago

I forked this fine code for a long while at https://github.com/nexB/typecode/blob/8e926684f260ce1cf7ffed74b2da99db97210f13/src/typecode/magic2.py

One of the key change is that I can provide a bundled pre-built binary of use a system-provided binary for libmagic and the magic db, which is not possible here. For instance: https://github.com/nexB/scancode-plugins/tree/develop/builtins/typecode_libmagic-linux and https://github.com/nexB/scancode-plugins/tree/develop/builtins/typecode_libmagic_system_provided

I would much prefer to fold that code back here at some point. Would you be open to have a way to provide a libmagic and db path rather than always use the same heuristics code?

ahupp commented 3 years ago

Thanks for starting this discussion. I think a nice way to handle this is publishing a separate package that exposes the shared lib/data file with package_data, and then make that an optional dependency of python-magic with extra_require. The extra_require is nice because we can bump the versions together, though not strictly necessary. What do you think?

pombredanne commented 3 years ago

package_data and extra_requires are the way to go indeed. And that can be then either used through a conditional try/except import or a setuptools "entrypoint" point plugin.

This is more or less what we do now:

And I have a build loop otherwise in https://github.com/nexB/scancode-plugins/blob/develop/etc/scripts/fetch-plugins.sh to do the actual footwork of assembling pre-built binaries for all OSes.

So in recap, we can adapt, steal, reuse or not any of the code above for make benefit of the great libmagic!

kratsg commented 2 years ago

In the meantime as this discussion didn't seem to go anywhere -- I went and created a pylibmagic package that should provide the appropriate libraries as needed for most mac/linux distros (I need help getting windows supported).

https://pypi.org/project/pylibmagic/

You just need to install and import this before importing magic and there's no change needed in python-magic. All that's really needed is to patch to override the hardcoded libmagic.so.1 that python-magic uses for linux, since this is related to a minor bug in the core python code.

ahupp commented 1 year ago

Merging into https://github.com/ahupp/python-magic/issues/293

kratsg commented 1 year ago

Merging into #293

Is this merging appropriate? The merged issue is specifically about Windows, but this issue is not OS-specific.

ahupp commented 1 year ago

@kratsg Basically 100% of the issues with libmagic are on Windows, so my intent was to just solve it there. OSX and linux all have good solutions for this. Of course in principle once this is setup for windows other platforms are straightforward but given Python doesn't have awesome tooling for building+shipping binaries I'd rather keep it limited.

kratsg commented 1 year ago

@ahupp that's fair. I've solved it for MacOSX and Linux via https://github.com/kratsg/pylibmagic/ right now. The solution there is that it ships a pre-built binary of file with the package, so if import magic doesn't work, then

import pylibmagic
import magic

will. It does require some monkeypatching of utilities that python-magic depends on, but does so in order to make sure the shared libs are findable.

ddelange commented 1 year ago

the whole idea of python-magic uploading binary (wheel) distribution would be to package the libmagic binary into the wheel (zip).

from the packaging point of view, there should:

full example

ahupp commented 1 year ago

It's certainly possible to ship the binaries for every platform, but it's not obvious a good cost/benefit tradeoff. It does have the nice benefit of avoiding version skew between the Python and native parts. Do you feel like using outside packages from Debian, homebrew etc is a problem?

On Mon, Aug 28, 2023, 12:14 PM ddelange @.***> wrote:

the whole idea of python-magic uploading binary (wheel) distribution would be to package the libmagic binary into the wheel (zip).

from the packaging point of view, there should:

full example https://github.com/MagicStack/asyncpg/blob/v0.28.0/.github/workflows/release.yml#L73-L130

— Reply to this email directly, view it on GitHub https://github.com/ahupp/python-magic/issues/233#issuecomment-1696235524, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAJ5ERBG6XHYCQ7ICMY3C3XXTUZNANCNFSM4WM3FKTA . You are receiving this because you were mentioned.Message ID: @.***>

ddelange commented 1 year ago

I only know of one other package that serves binary distributions (.whl), and still requires the user to additionally install an external binary (which will be dynamically linked / searched for at runtime): https://pypi.org/project/mxnet/

But that's only because of licensing of that one binary, which they would otherwise include in the binary distribution. Wheels are officially only allowed to dynamically link against glibc on the system, anything else needs to be included in the wheel.

Generally, you would:

Do you feel like using outside packages from Debian, homebrew etc is a problem?

So to answer your question:

ahupp commented 1 year ago

@ddelange I did a random sample of some top-250 packages that are (afaik) source-only and they all distribute a .-py3-none-any.whl:

https://pypi.org/project/typing-extensions/#files https://pypi.org/project/requests/#files https://pypi.org/project/wheel/#files

I thought wheel files were used because they are the product of any "build" step (setup.py etc) so don't need to execute any code to install?

ddelange commented 1 year ago

py3-none-any.whl wheels (a wheel is just a zip file with a .whl extension) can run on any python 3.5+ distribution, on win, mac, and nix, regardless of cpu architecture (aarch, x86_64, etc), because they only contain python files and no compiled binaries. These are pure-python libraries. If the code will run on both py2.7 and py3.5+, you can python setup.py bdist_wheel --universal and you'll get a py2.py3-none-any.whl.

Any project that needs compiled binaries (cythonized, rust binaries, c++ backend etc), will publish wheels for a wealth of combinations of python version (minor version specific ABI), OS and CPU architecture, containing pre-compiled binaries that will execute on the target system. See for instance this list of popular python libraries.

I thought wheel files were used because they are the product of any "build" step (setup.py etc) so don't need to execute any code to install?

That is correct, when wheels are available on PyPI, pip does not need to execute setup.py, but can copy the python (and binary) files from the wheel straight into site-packages. But as explained above, wheels hosted on PyPI should be self-contained.

So in case of libmagic, strictly speaking you should host an sdist on PyPI, which will detect a missing libmagic on install time by assertion in setup.py (or by copy attempt). If you choose to additionally host bdist (wheels) on PyPI, they should be self-contained, system specific wheels containing precompiled libmagic binaries.

Does that make sense?

ahupp commented 1 year ago

That makes sense, thanks for the explanation. But, surely this isn't the only package that has an external dependency on some installed library though, there are plenty of cases where you prefer/must rely on something outside.

Regardless, it is clearly a source of regular issues for users and seems worth fixing. I'll take a look at your PR soon and go from there.

On Fri, Sep 1, 2023, 7:26 PM ddelange @.***> wrote:

py3-none-any.whl wheels (a wheel is just a zip file with a .whl extension) can run on any oython 3.5+ distribution, on win, mac, and nix, regardless of cpu architecture (aarch, x86_64, etc), because they only contain python files and no compiled binaries. These are pure-python libraries. If the code will run on both py2.7 and py3.5+, you can python setup.py bdist_wheel --universal and you'll get a py2.py3-none-any.whl.

Any project that needs compiled binaries (cythonized, rust binaries, c++ backend etc), will publish wheels for a wealth of combinations of python version (minor version specific ABI), OS and CPU architecture, containing pre-compiled binaries that will execute on the target system. See for instance this list https://github.com/catboost/catboost/issues/2481#issuecomment-1693130848 of popular python libraries.

I thought wheel files were used because they are the product of any "build" step (setup.py etc) so don't need to execute any code to install?

That is correct, when wheels are available on PyPI, pip does not need to execute setup.py, but can copy the python (and binary) files from the wheel straight into site-packages. But as explained above, wheels hosted on PyPI should be self-contained.

So in case of libmagic, it's either hosting only sdist on PyPI (so that a missing libmagic will be detected on install time by assertion in setup.py), or additionally hosting self-contained, system specific wheels on PyPi containing precompiled libmagic binaries.

Does that make sense?

— Reply to this email directly, view it on GitHub https://github.com/ahupp/python-magic/issues/233#issuecomment-1703650807, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAJ5EWW5BGAIKCW7GOYH43XYKKMTANCNFSM4WM3FKTA . You are receiving this because you were mentioned.Message ID: @.***>