ahupp / python-magic

A python wrapper for libmagic
Other
2.65k stars 283 forks source link

Inconsistent mimetype returned #255

Closed tgalante-wavefin closed 2 years ago

tgalante-wavefin commented 2 years ago

I have a bit of a strange issue, which I can only guess is because of a libmagic update (recently updated our base image), but perhaps you can shed some light on this. Trying to check mime types to ensure an image file is supported, but one of our tests are failing on a specific BMP image (other images seem to be fine). from_file seems to work as expected, but from_buffer does not.

In [1]: import magic

In [2]: test_file = '/test_files/supported/test.bmp'

In [3]: mimetype = magic.from_file(test_file, mime=True)

In [4]: mimetype
Out[4]: 'image/bmp'

In [5]: file = open(test_file, mode="rb")

In [6]: mimetype2 = magic.from_buffer(file.read(2048), mime=True)

In [7]: mimetype2
Out[7]: 'application/octet-stream'

Any idea as to why this might be happening?

ahupp commented 2 years ago

This is a behavior of the underlying libmagic library, and tbh I'm not really sure why it happens.

Can you share the BMP file?

tgalante-wavefin commented 2 years ago

Sure, it's here.

ahupp commented 2 years ago

Looks like it works if you read the whole file:

>>> b = open("/home/adam/home/Downloads/test.bmp", "rb").read()
>>> magic.from_buffer(b, mime=True)
'image/bmp'

I'm not sure why libmagic requires that.

tgalante-wavefin commented 2 years ago

Interesting, I wonder why the implementation changed. Regardless, thanks for looking into it!