Closed pombredanne closed 1 year ago
Thanks for starting this discussion. I think a nice way to handle this is publishing a separate package that exposes the shared lib/data file with package_data, and then make that an optional dependency of python-magic with extra_require. The extra_require is nice because we can bump the versions together, though not strictly necessary. What do you think?
package_data
and extra_requires
are the way to go indeed. And that can be then either used through a conditional try/except import or a setuptools "entrypoint" point plugin.
This is more or less what we do now:
And I have a build loop otherwise in https://github.com/nexB/scancode-plugins/blob/develop/etc/scripts/fetch-plugins.sh to do the actual footwork of assembling pre-built binaries for all OSes.
So in recap, we can adapt, steal, reuse or not any of the code above for make benefit of the great libmagic!
In the meantime as this discussion didn't seem to go anywhere -- I went and created a pylibmagic
package that should provide the appropriate libraries as needed for most mac/linux distros (I need help getting windows supported).
https://pypi.org/project/pylibmagic/
You just need to install and import this before importing magic
and there's no change needed in python-magic
. All that's really needed is to patch to override the hardcoded libmagic.so.1
that python-magic
uses for linux, since this is related to a minor bug in the core python code.
Merging into https://github.com/ahupp/python-magic/issues/293
Merging into #293
Is this merging appropriate? The merged issue is specifically about Windows, but this issue is not OS-specific.
@kratsg Basically 100% of the issues with libmagic are on Windows, so my intent was to just solve it there. OSX and linux all have good solutions for this. Of course in principle once this is setup for windows other platforms are straightforward but given Python doesn't have awesome tooling for building+shipping binaries I'd rather keep it limited.
@ahupp that's fair. I've solved it for MacOSX and Linux via https://github.com/kratsg/pylibmagic/ right now. The solution there is that it ships a pre-built binary of file
with the package, so if import magic
doesn't work, then
import pylibmagic
import magic
will. It does require some monkeypatching of utilities that python-magic
depends on, but does so in order to make sure the shared libs are findable.
the whole idea of python-magic uploading binary (wheel) distribution would be to package the libmagic binary into the wheel (zip).
from the packaging point of view, there should:
CIBW_BEFORE_ALL=./install_libmagic.sh
It's certainly possible to ship the binaries for every platform, but it's not obvious a good cost/benefit tradeoff. It does have the nice benefit of avoiding version skew between the Python and native parts. Do you feel like using outside packages from Debian, homebrew etc is a problem?
On Mon, Aug 28, 2023, 12:14 PM ddelange @.***> wrote:
the whole idea of python-magic uploading binary (wheel) distribution would be to package the libmagic binary into the wheel (zip).
from the packaging point of view, there should:
- only only host a source distribution (which will fail to install if libmagic is not on available the system)
- upload linux/win/mac platform dependent wheels that include libmagic, e.g. using cibuildwheel https://github.com/pypa/cibuildwheel in github actions with a platform-aware CIBW_BEFORE_ALL=./install_libmagic.sh https://cibuildwheel.readthedocs.io/en/stable/options/#before-all
full example https://github.com/MagicStack/asyncpg/blob/v0.28.0/.github/workflows/release.yml#L73-L130
— Reply to this email directly, view it on GitHub https://github.com/ahupp/python-magic/issues/233#issuecomment-1696235524, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAJ5ERBG6XHYCQ7ICMY3C3XXTUZNANCNFSM4WM3FKTA . You are receiving this because you were mentioned.Message ID: @.***>
I only know of one other package that serves binary distributions (.whl), and still requires the user to additionally install an external binary (which will be dynamically linked / searched for at runtime): https://pypi.org/project/mxnet/
But that's only because of licensing of that one binary, which they would otherwise include in the binary distribution. Wheels are officially only allowed to dynamically link against glibc on the system, anything else needs to be included in the wheel.
Generally, you would:
manylinux_2_28
in the wheel filename stands for glibc >= 2.28
(mostly all 2020+ linux distributions like debian 10 buster, ubuntu 20.04 focal, almalinux/rhel 8, ...). When building a wheel under this assumption (example PR), pip will only install this on compatible (new enough) systems.manylinux_2014
wheel, which goes back as far as debian 6 or something.Do you feel like using outside packages from Debian, homebrew etc is a problem?
So to answer your question:
@ddelange I did a random sample of some top-250 packages that are (afaik) source-only and they all distribute a .-py3-none-any.whl:
https://pypi.org/project/typing-extensions/#files https://pypi.org/project/requests/#files https://pypi.org/project/wheel/#files
I thought wheel files were used because they are the product of any "build" step (setup.py etc) so don't need to execute any code to install?
py3-none-any.whl
wheels (a wheel is just a zip file with a .whl
extension) can run on any python 3.5+ distribution, on win, mac, and nix, regardless of cpu architecture (aarch, x86_64, etc), because they only contain python files and no compiled binaries. These are pure-python libraries. If the code will run on both py2.7 and py3.5+, you can python setup.py bdist_wheel --universal
and you'll get a py2.py3-none-any.whl
.
Any project that needs compiled binaries (cythonized, rust binaries, c++ backend etc), will publish wheels for a wealth of combinations of python version (minor version specific ABI), OS and CPU architecture, containing pre-compiled binaries that will execute on the target system. See for instance this list of popular python libraries.
I thought wheel files were used because they are the product of any "build" step (setup.py etc) so don't need to execute any code to install?
That is correct, when wheels are available on PyPI, pip does not need to execute setup.py
, but can copy the python (and binary) files from the wheel straight into site-packages. But as explained above, wheels hosted on PyPI should be self-contained.
So in case of libmagic, strictly speaking you should host an sdist on PyPI, which will detect a missing libmagic on install time by assertion in setup.py (or by copy attempt). If you choose to additionally host bdist (wheels) on PyPI, they should be self-contained, system specific wheels containing precompiled libmagic binaries.
Does that make sense?
That makes sense, thanks for the explanation. But, surely this isn't the only package that has an external dependency on some installed library though, there are plenty of cases where you prefer/must rely on something outside.
Regardless, it is clearly a source of regular issues for users and seems worth fixing. I'll take a look at your PR soon and go from there.
On Fri, Sep 1, 2023, 7:26 PM ddelange @.***> wrote:
py3-none-any.whl wheels (a wheel is just a zip file with a .whl extension) can run on any oython 3.5+ distribution, on win, mac, and nix, regardless of cpu architecture (aarch, x86_64, etc), because they only contain python files and no compiled binaries. These are pure-python libraries. If the code will run on both py2.7 and py3.5+, you can python setup.py bdist_wheel --universal and you'll get a py2.py3-none-any.whl.
Any project that needs compiled binaries (cythonized, rust binaries, c++ backend etc), will publish wheels for a wealth of combinations of python version (minor version specific ABI), OS and CPU architecture, containing pre-compiled binaries that will execute on the target system. See for instance this list https://github.com/catboost/catboost/issues/2481#issuecomment-1693130848 of popular python libraries.
I thought wheel files were used because they are the product of any "build" step (setup.py etc) so don't need to execute any code to install?
That is correct, when wheels are available on PyPI, pip does not need to execute setup.py, but can copy the python (and binary) files from the wheel straight into site-packages. But as explained above, wheels hosted on PyPI should be self-contained.
So in case of libmagic, it's either hosting only sdist on PyPI (so that a missing libmagic will be detected on install time by assertion in setup.py), or additionally hosting self-contained, system specific wheels on PyPi containing precompiled libmagic binaries.
Does that make sense?
— Reply to this email directly, view it on GitHub https://github.com/ahupp/python-magic/issues/233#issuecomment-1703650807, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAJ5EWW5BGAIKCW7GOYH43XYKKMTANCNFSM4WM3FKTA . You are receiving this because you were mentioned.Message ID: @.***>
I forked this fine code for a long while at https://github.com/nexB/typecode/blob/8e926684f260ce1cf7ffed74b2da99db97210f13/src/typecode/magic2.py
One of the key change is that I can provide a bundled pre-built binary of use a system-provided binary for libmagic and the magic db, which is not possible here. For instance: https://github.com/nexB/scancode-plugins/tree/develop/builtins/typecode_libmagic-linux and https://github.com/nexB/scancode-plugins/tree/develop/builtins/typecode_libmagic_system_provided
I would much prefer to fold that code back here at some point. Would you be open to have a way to provide a libmagic and db path rather than always use the same heuristics code?