Open simonw opened 5 hours ago
Tried this:
python -c 'import puremagic, pprint, sys; pprint.pprint(puremagic.magic_stream(open(sys.argv[-1], "rb")))' output.wav
And got:
[PureMagicWithConfidence(byte_match=b'RIFFH\xe0\x02\x00WAVE', offset=8, extension='.wav', mime_type='audio/wave', name='Waveform Audio File Format', confidence=0.8),
PureMagicWithConfidence(byte_match=b'WAVEfmt ', offset=8, extension='.wav', mime_type='audio/x-wav', name='Windows audio file ', confidence=0.8),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.4xm', mime_type='', name='4X Movie video', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.cdr', mime_type='', name='CorelDraw document', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.avi', mime_type='video/avi', name='Resource Interchange File Format', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.cda', mime_type='', name='Resource Interchange File Format', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.qcp', mime_type='audio/vnd.qcelp', name='Resource Interchange File Format', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.rmi', mime_type='audio/mid', name='Resource Interchange File Format', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.wav', mime_type='audio/wav', name='Resource Interchange File Format', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.ds4', mime_type='', name='Micrografx Designer graphic', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.ani', mime_type='application/x-navi-animation', name='Windows animated cursor', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.dat', mime_type='video/mpeg', name='Video CD MPEG movie', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.cmx', mime_type='', name='Corel Presentation Exchange metadata', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.webp', mime_type='image/webp', name='RIFF WebP', confidence=0.4),
PureMagicWithConfidence(byte_match=b'WAVE', offset=8, extension='.wav', mime_type='audio/x-wav', name='WAV audio', confidence=0.4)]
I had a similar issue on llm-gemini
where puremagic
was returning audio/mpeg
for MP3 files but the Gemini AI wanted audio/mp3
:
It turned out in that case puremagic
was correct and Gemini was wrong - the official mimetype for MP3 is indeed audio/mpeg
.
As far as I can tell, the "correct" type to return for a
.wav
file (with52 49 46 46 xx xx xx xx 57 41 56 45 66 6d 74 20
isaudio/wav
- but this library returnsaudio/wave
.I got very confused looking through the code because I came across these two lines:
https://github.com/cdgriffith/puremagic/blob/763349ec4d02ba930fb1142c6eb684afdf06c6ab/puremagic/magic_data.json#L103 https://github.com/cdgriffith/puremagic/blob/763349ec4d02ba930fb1142c6eb684afdf06c6ab/puremagic/magic_data.json#L1118
I've found it hard to research the correct resolution though, as both
audio/wav
andaudio/wave
are entirely missing from what I thought was the official RFC for these! https://www.iana.org/assignments/media-types/media-types.xhtml#audioMDN lists
audio/wav
https://developer.mozilla.org/en-US/docs/Web/HTTP/MIME_types/Common_typesI'm not sure there is a correct answer to this question.