ahupp / python-magic

A python wrapper for libmagic
Other
2.59k stars 280 forks source link

Bug in from_buffer function #314

Closed mihkelraba closed 4 months ago

mihkelraba commented 4 months ago

Detecting file type with from_file and from_buffer functions give different results for the same file. From_buffer function recognizes elf binary as shared object.

OS: Ubuntu 22.04

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import magic
>>> magic.from_buffer(open("/bin/bash", "rb").read(2048))
'ELF 64-bit LSB shared object, x86-64, version 1 (SYSV)'
>>> magic.from_file('/bin/bash')
'ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=7a6408ba82a2d86dd98f1f75ac8edcb695f6fd60, for GNU/Linux 3.2.0, stripped'
ahupp commented 4 months ago

Do you get the same result if the buffer contains the entire file? In general you can't guaruntee that all the metadata used by the ELF parser is in the first 2k. Note this is functionality of libmagic and not python-magic specifically.

mihkelraba commented 4 months ago

Yes, with full file as well. With different files also. Using file command from cli it works correctly.

Its not the library underneath, it is from_buffer function as you can see from the example from_file returns correct result.

ahupp commented 4 months ago

Sometimes libmagic just behaves differently in from_buffer and from_file, even if the content are identical. I think there's some features that only trigger if it gets a file descriptor.