cdgriffith / puremagic

Pure python implementation of identifying files based off their magic numbers
MIT License
158 stars 34 forks source link

Weird issue with non-compliant AIFF files #86

Closed NebularNerd closed 3 weeks ago

NebularNerd commented 2 months ago

Just starting a PR based on #85 and came across a weird issue. It appears the certain malformed AIFF files cannot be read under certain conditions. If we use the example in python:

import puremagic
filename = "r:\aiff\Fnonull.aif"
puremagic.magic_file(filename)

We get the following:

[PureMagicWithConfidence(byte_match=b'FORM', offset=0, extension='.aif', mime_type='audio/x-aiff', name='Audio Interchange File', confidence=0.9), PureMagicWithConfidence(byte_match=b'FORM\x00\x00\x00\\AIFC', offset=8, extension='.aifc', mime_type='audio/x-aiff', name='Audio Interchange File Format (Compressed)', confidence=0.8), PureMagicWithConfidence(byte_match=b'FORM', offset=0, extension='.aiff', mime_type='audio/aiff', name='Audio Interchange File', confidence=0.4), PureMagicWithConfidence(byte_match=b'FORM', offset=0, extension='.djv', mime_type='image/vnd.djvu', name='DjVu image', confidence=0.4), PureMagicWithConfidence(byte_match=b'FORM', offset=0, extension='.djv', mime_type='image/vnd.djvu+multipage', name='DjVu document', confidence=0.4), PureMagicWithConfidence(byte_match=b'FORM', offset=0, extension='', mime_type='application/x-iff', name='IFF file', confidence=0.4), PureMagicWithConfidence(byte_match=b'FORM', offset=0, extension='.sc2', mime_type='', name='SimCity 2000 Map File', confidence=0.4), PureMagicWithConfidence(byte_match=b'AIFC', offset=8, extension='.aiffc', mime_type='audio/x-aifc', name='AIFC audio', confidence=0.4)]

However, if we do the following in a .py file

import puremagic
with open(r"r:\aiff\Fnonull.aif", "rb") as file:
    print(puremagic.magic_stream(file))

We get:

Traceback (most recent call last):
  File "R:\WinUAE\pm.py", line 3, in <module>
    print(puremagic.magic_stream(file))
  File "C:\Users\Andy\AppData\Local\Programs\Python\Python310\lib\site-packages\puremagic\main.py", line 351, in magic_stream
    head, foot = _stream_details(stream)
  File "C:\Users\Andy\AppData\Local\Programs\Python\Python310\lib\site-packages\puremagic\main.py", line 229, in _stream_details
    stream.seek(-max_foot, os.SEEK_END)
OSError: [Errno 22] Invalid argument

From reading around it appears to have something to do with malformed files and seek errors, but I can't quite see how Puremagic can read it one way and not the other.

Any thoughts on this?

Test files.

aiff.zip The files causing trouble are the ones labelled as Perverse Files from this page Samples

NebularNerd commented 3 weeks ago

OK I think I have this partially solved, rather than duplicate it all here I'll continue this in #96 which is the same issue.

cdgriffith commented 3 weeks ago

Thanks for the fixes, addressed in 1.27!