Closed XDRAGON2002 closed 1 year ago
@sushain97 Updated Pipfile, setup.py and README.md and added python-magic as dependency.
Great, Thanks for the review @sushain97 !
Using libmagic instead of shelling out to file is nice, but security-wise it's not a huge improvement. As I wrote in #47, we only need a few types (current list has 10). python-magic
is a thin layer of the libmagic C library with a huge range of file detection capabilities and security bug potential.
We're not sure whether any distribution has packages that rely on server-side use of libmagic, or whether it's common to have long-running processes that use libmagic with untrusted input
https://seclists.org/oss-sec/2014/q1/504 hey, they're talking about us :)
I was originally thinking it might be easy to just manually do the checks (our hfst bins start with HFST
, how hard can it be to parse just 10 types), but e.g. rtf's can start with {rtf
or {pard
and who knows what else and there are probably even worse annoyances with docx etc. So maybe that's not the way to go.
A better alternative I think would be to just restrict the magic database. It seems the python-magic
lib supports loading a user-specified magic database. You can compile a .mgc
with file -C -m ./dir-with-only-our-few-types/
(and apt-get source libmagic-dev
gives you the relevant pattern files in magic/Magdir/msooxml
etc.), test that it doesn't detect other things with file -m .mgc somefile.pdf
.
Fixes #47 Earlier subprocesses mimetype/xdg-mime/file were being used to detect MIME type which could lead to security issues so now then have been switched with python-magic a library used to detect MIME type within python.