cdgriffith / puremagic

Pure python implementation of identifying files based off their magic numbers
MIT License
161 stars 34 forks source link

Integrate advanced file scanning techniques #3

Open cdgriffith opened 10 years ago

cdgriffith commented 10 years ago

Better identify common files. Such as opening .docx/.pptx/.xlsx and viewing the XML file to figure out which exactly they are.

eight04 commented 1 year ago

Is it normal that a zip file was detected as docx?

eight04 commented 1 year ago

Using from_string.

cdgriffith commented 1 year ago

@eight04 Wouldn't surprise me as docx is actually a zip file, probably means that I have too broad a match for the docx type.

Perchance do you know what program generated that zip file?

eight04 commented 1 year ago

Nope, but I can upload the zip file: 沙花叉word-20220523T041928Z-001.zip There are more cases, I just uploaded one of them.

Windows 10 Python 3.10.8 puremagic 1.14