ahupp / python-magic

A python wrapper for libmagic
Other
2.59k stars 280 forks source link

Magic can't get a proper mime type from a MP3 file #307

Closed kskadart closed 10 months ago

kskadart commented 10 months ago

Hey folks,

I try to get a mime type from an audio asset:

import magic
magic.from_file('test_audio.mp3', mime=True)

output:

'application/octet-stream'

As I can see from the ffprobe output the asset contains corrupted metadata:

ffprobe test_audio.mp3
...
[mp3 @ 0x7fcbb6904300] Skipping 48 bytes of junk at 70.

But I can get mime by the built-in python library mimetypes:

from mimetypes import MimeTypes
mime = MimeTypes()
mime.guess_type('test_audio.mp3')

output:

('audio/mpeg', None)

I can imagine that the magic and mimetypes have different logic... But probably you can suggest some robust approach how I can extract a mime type by the magic lib? I attach the audio asset and change ext from MP3 to MP4 because the github doesn't like MP3 format: https://github.com/ahupp/python-magic/assets/120260513/b614b76c-5651-47fb-a8a3-0001a60e5bfa

Thanks in advance

ahupp commented 10 months ago

I'm guessing mimetypes just looks at the extension but haven't looked closely at it. Interestingly, libmagic reports the correct type when not asking for MIME:

❯ file ~/home/Downloads/276412087-b614b76c-5651-47fb-a8a3-0001a60e5bfa.mp4
/home/adam/home/Downloads/276412087-b614b76c-5651-47fb-a8a3-0001a60e5bfa.mp4: Audio file with ID3 version 2.3.0

framework in ~
❯ file --mime ~/home/Downloads/276412087-b614b76c-5651-47fb-a8a3-0001a60e5bfa.mp4
/home/adam/home/Downloads/276412087-b614b76c-5651-47fb-a8a3-0001a60e5bfa.mp4: application/octet-stream; charset=binary

Unfortunately this is a libmagic bug outside of python-magic so I'm not sure how to resolve; maybe you can dig through the definition files and see how the mime and non-mime paths differ? (surprised they can differ though)

kskadart commented 10 months ago

Hey Adam, thanks for the fast answer! I'll try to open a ticket directly for the magiclib