ahupp / python-magic

A python wrapper for libmagic
Other
2.64k stars 283 forks source link

MimeTypes as Array #203

Closed sr-verde closed 1 year ago

sr-verde commented 4 years ago

I want to get all mimetypes of a file using the MAGIC_CONTINUE flag. I would prefer an array instead of a string that needs to be parsed first. Is it possible to retrieve all mimetypes as array?

If not, would it be within the scope of this project to implement a possibility that fulfill my requirements? I would spend some time to implement it.

ahupp commented 4 years ago

I'm not familiar with that libmagic feature, can you given an example of how it's used?

sr-verde commented 4 years ago

Of course, so let's have a look at an example for different type of files. At first, an example with two different files without MAGIC_CONTINUE, then an example with MAGIC_CONTINUE.

Python 3.7.6 (default, Jan  6 2020, 15:14:19) 
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from magic import Magic
>>> 
>>> m = Magic(mime=True)
>>> m.from_file('/tmp/archive.tar.gz')
'application/gzip'
>>> m.from_file('/tmp/launchKVMJava.do')
'text/html'
>>> 
>>> m = Magic(mime=True, keep_going=True)
>>> m.from_file('/tmp/archive.tar.gz')
'application/gzip\\012- application/octet-stream'
>>> m.from_file('/tmp/launchKVMJava.do')
'text/html\\012- text/plain'

With active MAGIC_CONTINUE flag, you'll all matches, not just the first. \\012 indicatets a line break, all following mimetypes have a - (indicating a list?).

What one could do, is to return an array instead of this creepy string.

ahupp commented 4 years ago

From an API perspective, changing the return type of from_[file,buffer] depending on the CONTINUE flag would be confusing.

One thing I've considered is moving to a structured return value similar to how libmagic's wrapper does it, which could be more easily extended to include char encoding, mime type, text description etc all at the same time. Then this sort of "and other types" feature could be more easily tacked on as an optional list.

ahupp commented 1 year ago

This would be a pretty substantial API change for a niche case, would prefer to not do it.