apertium / apertium-apy

📦 Apertium HTTP Server in Python
https://wiki.apertium.org/wiki/Apertium-apy
GNU General Public License v3.0
32 stars 42 forks source link

MIME identification using python #190

Closed XDRAGON2002 closed 1 year ago

XDRAGON2002 commented 2 years ago

Fixes #47 Earlier subprocesses mimetype/xdg-mime/file were being used to detect MIME type which could lead to security issues so now then have been switched with python-magic a library used to detect MIME type within python.

XDRAGON2002 commented 2 years ago

@sushain97 Updated Pipfile, setup.py and README.md and added python-magic as dependency.

XDRAGON2002 commented 2 years ago

Great, Thanks for the review @sushain97 !

unhammer commented 2 years ago

Using libmagic instead of shelling out to file is nice, but security-wise it's not a huge improvement. As I wrote in #47, we only need a few types (current list has 10). python-magic is a thin layer of the libmagic C library with a huge range of file detection capabilities and security bug potential.

We're not sure whether any distribution has packages that rely on server-side use of libmagic, or whether it's common to have long-running processes that use libmagic with untrusted input

https://seclists.org/oss-sec/2014/q1/504 hey, they're talking about us :)

I was originally thinking it might be easy to just manually do the checks (our hfst bins start with HFST, how hard can it be to parse just 10 types), but e.g. rtf's can start with {rtf or {pard and who knows what else and there are probably even worse annoyances with docx etc. So maybe that's not the way to go.

A better alternative I think would be to just restrict the magic database. It seems the python-magic lib supports loading a user-specified magic database. You can compile a .mgc with file -C -m ./dir-with-only-our-few-types/ (and apt-get source libmagic-dev gives you the relevant pattern files in magic/Magdir/msooxml etc.), test that it doesn't detect other things with file -m .mgc somefile.pdf.