ahupp / python-magic

A python wrapper for libmagic
Other
2.59k stars 280 forks source link

Magic.__init__: add kwargs to enable/disable different types of magic detection #303

Closed risicle closed 10 months ago

risicle commented 10 months ago

libmagic consists of a "soft" detector that uses a generic matching engine and a number of special-purpose detectors written in custom C. These custom detectors have a mixed security record, so some users may want to disable them if they have no need to be able to detect e.g. ELF files.

Most of the flags for this were already defined in the module, but there was no sensible way to use them, so here I add kwargs controlling most of them. They're defaulted to True to prevent this from being a breaking change.

A couple of the flags are deprecated and don't do anything anymore so I omitted them and, reading the source, MAGIC_NO_CHECK_COMPRESS seems to do basically the same thing as (unsetting) MAGIC_COMPRESS these days - so I omitted that too so as not to add confusion with the existing uncompress kwarg.

Then I added tests covering a couple of these flags, which included adding some new sample files. I didn't do this for all the formats as that would have required me sourcing and including sample files for all of these formats, some of which are quite obscure and fiddly. I couldn't get the csv detector to positively detect anything for instance.

ahupp commented 10 months ago

This is a great change (both the functionality, and the explanation/testing). Thanks!

I think there's a risk/reward tradeoff in these detectors, where e.g the added ELF details or support for SIMH tapes (!) are probably not materially useful compared to their risk, while JSON is pretty useful. I'll merge this as-is, but I think before the next release will make it a trinary option like "None | All | Some(specific detectors)".