ahupp / python-magic

A python wrapper for libmagic
Other
2.59k stars 280 forks source link

Build platform-specific wheels containing libmagic #294

Open ddelange opened 12 months ago

ddelange commented 12 months ago

Hi @ahupp πŸ‘‹

This PR builds self-contained wheels as discussed in #233. For Windows users, this renders python-magic-bin from @julian-r obsolete.

pip install these wheels

pip can install from GitHub Release assets from my fork:

pip install python-magic --force-reinstall --find-links https://github.com/ddelange/python-magic/releases/expanded_assets/0.4.28.post7
- python-magic-0.4.27.tar.gz
+ python-magic-0.4.28.tar.gz
- python_magic-0.4.27-py2.py3-none-any.whl
+ python_magic-0.4.28-py2.py3-none-macosx_10_9_x86_64.whl
+ python_magic-0.4.28-py2.py3-none-macosx_11_0_arm64.whl
+ python_magic-0.4.28-py2.py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
+ python_magic-0.4.28-py2.py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
+ python_magic-0.4.28-py2.py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl
+ python_magic-0.4.28-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
+ python_magic-0.4.28-py2.py3-none-musllinux_1_1_aarch64.whl
+ python_magic-0.4.28-py2.py3-none-musllinux_1_1_ppc64le.whl
+ python_magic-0.4.28-py2.py3-none-musllinux_1_1_s390x.whl
+ python_magic-0.4.28-py2.py3-none-musllinux_1_1_x86_64.whl
+ python_magic-0.4.28-py2.py3-none-win32.whl
+ python_magic-0.4.28-py2.py3-none-win_amd64.whl

The wheels:

CI/CD

dists build with official cibuildwheel on GitHub Actions, and they build in parallel:

image

fix #137, fix #288, fix #225, fix #276, fix #248, fix #87, fix #139, fix #233, fix #73, fix #60, fix #34, fix #293, fix #233, fix #278, fix #262, fix #248, fix #238, fix #145, fix #61, fix #12, fix #295, fix #311, fix #312, fix #313, fix #321, fix #332, fix #249

apirogov commented 11 months ago

This is nice! Hope this will be merged soon!

Just ran into issues with my library being not usable by Mac and Windows users because I rely on python-magic. If there are wheels, I don't need to find a workaround or replace the library :)

python-magic-bin did not work for some of them, by the way.

ahupp commented 11 months ago

This is huge, thank you! Apology for the delay I thought I'd commented earlier but guess not. I'll look this over soon; I didn't quite understand how bad the binary dep situation was expecially on windows.

jean-humann commented 9 months ago

@ahupp @stumpylog could we have this one merged (and released) by the end of the year please ?

stumpylog commented 9 months ago

I don't have anything to do with the project, so I can't help. I just dropped by to look at something else and happened to notice some things

patacca commented 7 months ago

One month has passed since last activity, I think this PR is badly needed to fix the issues with windows. Would it be possible to move things forward or at least give an estimated schedule?

ddelange commented 7 months ago

Wheels now get installed and tested inside the cibuildwheel docker environment. I also tested the latest wheels on google colab:

# pip install python-magic --force-reinstall --find-links https://github.com/ddelange/python-magic/releases/expanded_assets/0.4.28
>>> import magic
>>> magic.Magic(mime=True).from_buffer(b'\x00\x00\x00\x1cftypisom\x00\x00\x02\x00isomiso2mp41\x00')
'video/mp4'

@ahupp one last TODO for you before/after merging ^

martin-liu commented 6 months ago

@ahupp Any update?

jmoraleda commented 5 months ago

Thank you @ddelange

I confirm that pip install python-magic --force-reinstall --find-links https://github.com/ddelange/python-magic/releases/expanded_assets/0.4.28 works for me on MSW.

On Linux, I feel that libmagic is easy enough to install from your package manager (if it is not already installed in your base installation) so I am not certain we should include a copy of the binary libraries by default in all cases.

Perhaps to get the best of both worlds we should create an optional dependency on binaries for libmagic and then clearly specify how to install:

I have not enough experience on MAC to know which to recommend as the default.

What do you think @ddelange and @ahupp ?

ddelange commented 5 months ago

hi @jmoraleda :wave: please see the discussion here

if a package has reasons to refrain from binary distributions (bdist_wheel, .whl) e.g. for linux, it's as simple as not publishing wheels for linux. then, on linux systems, pip will fall back to the source distribution (sdist, tar.gz). i personally dont see any reason to refrain from publishing linux wheels.

ddelange commented 5 months ago

if you want to circumvent the binary distribution because you want to use your own libmagic, you can add --no-binary python-magic to your pip install command to force a fallback to source distribution

jmoraleda commented 5 months ago

Thank you again @ddelange. Definitely having MSW wheels available is a huge plus. With respect to Linux, as long as there is a mechanism for installing using the system libraries and is prominently mentioned in the installation page, I agree there is no reason from not publishing linux wheels as well.

jmoraleda commented 5 months ago

Also, in debian like distributions python-magic is actually packaged by the system maintainers to already refer to the installed libraries, so if one installs from there with the distribution package manager, then avoiding duplicate binaries is already achieved, so even more of a reason to include wheels for everything in the pypi distribution.

ddelange commented 4 months ago

found out that the manylinux2014 images are pretty old and hence were getting a pretty old libmagic:

Package file-libs-5.11-37.el7.x86_64 already installed and latest version

upgraded now to manylinux_2_28 (rhel/almalinux 8):

Package file-libs-5.33-25.el8.x86_64 is already installed.
ddelange commented 4 months ago

the musllinux wheels are getting libmagic 5.45 ref https://pkgs.alpinelinux.org/package/edge/main/x86/libmagic

ddelange commented 4 months ago

could consider building from source in the manylinux images, to get 5.45 instead of 5.33...

ddelange commented 4 months ago

@ahupp building from source done in fe62a26

one last TODO for you before/after merging ^

AbdealiLoKo commented 3 months ago

Was looking for exactly something to simplify the binary installations for python-magic and came across this PR.

Looks like it is very close to the finish line ! Anything I could do to help ? If there is a place I can fetch the wheels and test, I am happy to try them out :)

ddelange commented 3 months ago

Hi @AbdealiLoKo :wave: to fetch the wheels, see the pip install command in the PR description. From my side this is PR ready for review.

AbdealiLoKo commented 3 months ago

So, I tried this out:

$ venv/bin/pip install python-magic
Successfully installed python-magic-0.4.27

$ venv/bin/ipython
In [1]: import magic
In [2]: magic.detect_from_content(open('./README.md', 'rb').read(2048))
Out[2]: FileMagic(mime_type='text/plain', encoding='us-ascii', name='ASCII text')

$ venv/bin/pip uninstall python-magic
Successfully uninstalled python-magic-0.4.27

$ venv/bin/pip install python-magic --force-reinstall --find-links https://github.com/ddelange/python-magic/releases/expanded_assets/0.4.28.post6
Successfully installed python-magic-0.4.28

$ venv/bin/ipython
In [2]: magic.detect_from_content(open('./README.md', 'rb').read(2048))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 magic.detect_from_content(open('./README.md', 'rb').read(2048))

File ~/corridor-platforms/venv/lib/python3.10/site-packages/magic/__init__.py:479, in _add_compat.<locals>.deprecation_wrapper.<locals>._(*args, **kwargs)
    473 def _(*args, **kwargs):
    474     warnings.warn(
    475         "Using compatibility mode with libmagic's python binding. "
    476         "See https://github.com/ahupp/python-magic/blob/master/COMPAT.md for details.",
    477         PendingDeprecationWarning)
--> 479     return fn(*args, **kwargs)

File ~/corridor-platforms/venv/lib/python3.10/site-packages/magic/compat.py:286, in detect_from_content(byte_content)
    280 def detect_from_content(byte_content):
    281     '''Detect mime type, encoding and file type from bytes
    282
    283     Returns a `FileMagic` namedtuple.
    284     '''
--> 286     return _create_filemagic(mime_magic.buffer(byte_content),
    287                              none_magic.buffer(byte_content))

File ~/corridor-platforms/venv/lib/python3.10/site-packages/magic/compat.py:248, in _create_filemagic(mime_detected, type_detected)
    247 def _create_filemagic(mime_detected, type_detected):
--> 248     splat = mime_detected.split('; ')
    249     mime_type = splat[0]
    250     if len(splat) == 2:

AttributeError: 'NoneType' object has no attribute 'split'

This error comes for all the files I have tried so far:

  1. JAR file
  2. YAML file
  3. Excel file
  4. MP4 file
ddelange commented 3 months ago

can you try from_buffer instead of detect_from_content?

see also https://github.com/ahupp/python-magic/blob/0.4.27/magic/__init__.py#L427-L430 and https://github.com/ahupp/python-magic/blob/0.4.27/COMPAT.md?plain=1#L13

ddelange commented 3 months ago

@AbdealiLoKo I've released 0.4.28.post7 containing eba05b6, and verified that the compat module now correctly loads the magic.mgc file bundled in the wheel.

In compat.py, mime_magic.load() and none_magic.load() were silently returning a -1, because load_lib() now prefers the (recent) libmagic bundled in the wheel over the (older) system libmagic, which then caused magic.mgc not to be found in the standard paths. The fix now points to the bundled magic.mgc in this case.

On Google Colab:

# pip install python-magic --force-reinstall --find-links https://github.com/ddelange/python-magic/releases/expanded_assets/0.4.28.post7
>>> import magic
>>> magic.detect_from_content(b'\x00\x00\x00\x1cftypiso5\x00\x00\x00\x01isomiso5hlsf\x00\x00')
FileMagic(mime_type='video/mp4', encoding='binary', name='ISO Media, MP4 Base Media v5 ')
>>> magic.from_buffer(b'\x00\x00\x00\x1cftypiso5\x00\x00\x00\x01isomiso5hlsf\x00\x00')
'ISO Media, MP4 Base Media v5 '

Thanks for catching this!

cclauss commented 3 months ago

What are the remaining TODOs on this pull request?

ddelange commented 3 months ago
AbdealiLoKo commented 3 months ago

The compat part (which is what I generally use to avoid confusion with the non-pypi magic - works with post7 :) Thanks ! This works well for me.

+1 to merging and releasing it !

cclauss commented 3 months ago

Many people have commented on this pull request but there are very few reviews.

When you think a pull request is useful and is ready to be merged, please consider giving it a positive review.

Every check mark βœ”οΈ at the top right of this page gives project maintainers confidence that the proposed changes have been read through and deemed both useful and safe to merge into the codebase. Lots of πŸ‘ and "what is the ETA?" comments are easier for maintainers to ignore than βœ”οΈβœ”οΈβœ”οΈβœ”οΈβœ”οΈ from several different reviewers.

Anyone can review a pull request on GitHub. To do so here:

  1. Scroll to the top of this page.
  2. Click the Files changed tab and read through each file carefully looking for potential issues.
  3. Click the Review changes button.
  4. Click Approve (or one of the other options) and add comments only if they have not already been stated in the PR.
  5. Click Submit review so that your βœ”οΈ will be added to the list.
ddelange commented 3 months ago

thanks for the reviews!

ddelange commented 3 months ago

Hi :wave: I'll be AFK until end of June. @ahupp feel free to take over my branch, or merge as is! https://http.cat/301

cclauss commented 2 months ago

There is wonderful work in this pull request and it has four positive reviews. Unfortunately, it has been open for ~9 months without landing. Perhaps it would be useful to break it into three separate PRs that are easier to review and land.

One PR that deals only with macOS and another that deals only with Windows might be easier to land. Once that is done then this PR could be rebased to deal only with Linux and friends. I know it is extra work but I sense that new momentum is needed.

estarfoo commented 2 months ago

Hi! Great PR, is there any real contention about it beyond the scope of OS/distro support in add_libmagic.sh and README.md?

Most users will only ever want the wheels from this repo. In particular, this looks like it will solidly cover usage in Docker images. Anyone who wants to use the source version and provide libmagic themselves, probably knows best how to do the latter in their environment. given info on where this package will look for the library. (Those who package python-magic for their Linux distro of choice will already have a preferred way of ensuring libmagic presence. This will probably not exactly match anything suggested in python-magic docs.)

Even for those particularly invested in having a range of setup instructions, the PR in its current state should look like a clear improvement on master, and further improvements in that are will come more easily as separate PRs (because they won’t be tied to CI scripts).

So: how about merging this without completing libmagic setup instructions for every possible platform? Seems like it already does what the PR title says.

Privat33r-dev commented 2 months ago

So: how about merging this without completing libmagic setup instructions for every possible platform? Seems like it already does what the PR title says.

Totally for it. My suggestions just show the way to improve it, but I would merge it "as is" since it already provides a huge value. "Perfect is the enemy of good".

@ahupp hopefully you can find some time to review this most discussed PR in the python-magic's history :)

ddelange commented 2 months ago

@ahupp this also fixes failing CI on master (looks like the github actions linux runner image no longer ships libmagic by default)

ashtonpaul commented 1 month ago

@ahupp @ddelange Any update on this PR and the release? πŸ‘€ Is there anything that needs to be tested or are we waiting on anything other than a review?