f3-factory / fatfree-core

Fat-Free Framework core library
GNU General Public License v3.0
207 stars 88 forks source link

Unsafe MIME Type detection in Web->mime($file) #270

Closed Jiab77 closed 5 years ago

Jiab77 commented 5 years ago

Issue opened over fatfree project: https://github.com/bcosca/fatfree/issues/1138

Jiab77 commented 5 years ago

I might also improve the detection if the $file given is a gd image.

See line: https://github.com/bcosca/fatfree-core/pull/270/files#diff-bab800cb64eba82163ab93482194adc8R65

ikkez commented 5 years ago

I checked it.. it works on windows, yes.

One thing though: the new way requires to have a file physically available somewhere,.. before we just used the file suffix from its name or path, that should be kept in mind.

Nevertheless, that loading thing hurt in my eyes a bit, so I tried to "optimized" the proposal a bit more:

function mime($file) {
    if (is_file($file) && is_readable($file)) {
        // physical files
        if (extension_loaded('fileinfo'))
            $mime=mime_content_type($file);
        elseif(!preg_match('/^win/i',PHP_OS))
            $mime=trim(exec('file -bi '.escapeshellarg($file)));
    }
    else {
        // remote and stream files
        if (ini_get('allow_url_fopen') && ($fhandle=fopen($file,'rb'))) {
            // only get head bytes instead of whole file
            $bytes=fread($fhandle,20);
            fclose($fhandle);
        } elseif (($response=$this->request($file,['method' => 'HEAD']))
            && preg_grep('/HTTP\/\d\.\d 200/',$response['headers'])
            && ($type = preg_grep('/^Content-Type:/i',$response['headers']))) {
            // get mime type directly from response header
            return preg_replace('/^Content-Type:\s*/i','',array_pop($type));
        } else {
            // load complete file
            $bytes=file_get_contents($file);
        }
        if (extension_loaded('fileinfo')) {
            // get mime from fileinfo
            $finfo=finfo_open(FILEINFO_MIME_TYPE);
            $mime=finfo_buffer($finfo,$bytes);
        } elseif($bytes) {
            // magic number header fallback
            $map=[
                '\x64\x6E\x73\x2E'=>'audio/basic',
                '\x52\x49\x46\x46.{4}\x41\x56\x49\x20\x4C\x49\x53\x54'=>'video/avi',
                '\x42\x4d'=>'image/bmp',
                '\x42\x5A\x68'=>'application/x-bzip2',
                '[ -~]+$'=>'text/plain',
                '\x07\x64\x74\x32\x64\x64\x74\x64'=>'application/xml-dtd',
                '\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1'=>'application/msword',
                '\x50\x4B\x03\x04\x14\x00\x06\x00'=>'application/msword',
                '\x0D\x44\x4F\x43'=>'application/msword',
                'GIF\d+a'=>'image/gif',
                '\x1F\x8B'=>'application/x-gzip',
                '\xff\xd8\xff'=>'image/jpeg',
                '\x49\x46\x00'=>'image/jpeg',
                '\xFF\xFB'=>'audio/mpeg',
                '\x49\x44\x33'=>'audio/mpeg',
                '\x00\x00\x01\xBA'=>'video/mpeg',
                '\x4F\x67\x67\x53\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00'=>'audio/vorbis',
                '\x25\x50\x44\x46'=>'application/pdf',
                '\x89PNG\x0d\x0a'=>'image/png',
                '.{4}\x6D\x6F\x6F\x76\x'=>'video/quicktime',
                '\x53\x49\x54\x21\x00'=>'application/x-stuffit',
                '\x43\x57\x53'=>'application/x-shockwave-flash',
                '\x1F\x8B\x08'=>'application/x-tar',
                '\x49\x20\x49'=>'image/tiff',
                '\x52\x49\x46\x46.{4}\x57\x41\x56\x45\x66\x6D\x74\x20'=>'audio/wav',
                '\xFD\xFF\xFF\xFF\x20\x00\x00\x00'=>'application/vnd.ms-excel',
                '\x50\x4B\x03\x04'=>'application/x-zip-compressed'
            ];
            foreach ($map as $key=>$val)
                if (preg_match('/^'.$key.'/',substr($bytes,0,128)))
                    return $val;
        }
    }
    return (isset($mime) && !empty($mime) ? $mime : 'application/octet-stream');
}

The main point is, I've added a feature that only loads the first bytes of the remote/string files, and not the whole file at all to check the mime type. Since the mimetype is something in the file header, I thought it would be nice to save some bandwidth here, and that makes it indeed much faster in my tests. If that doesnt work I've added a HEAD request fallback that gathers the mime-type from the Content-Type of remote files. I now just used the mime_content_type function and dropped the finfo_file conditional part, since both techniques are from the same FileInfo extension.. so it doesnt matter (ref). I guess mime_content_type is just a shortcut for the other more advanced finfo_file method... Also dropped the getimagesizefromstring technique, as it requires the GD plugin and only works for images.. instead, in case the FileInfo extension is not avaiable, I've added a check to match the magic bytes in the files to get their proper mimetypes.. thats what mime_content_type is actually doing just a bit more convenient than the manual approach with a regex map of course... but it works (ref1, ref2).. Ah yeah and if FileInfo is not available and you're on "not windows", I'll now use a shell command to get the file mime type instead for physical files.

Thoughts? 😄

Jiab77 commented 5 years ago

@ikkez your version is much better than mine :grin: . you're right about the gd requirement, it was logical to me to use getimagesizefromstring as fallback as I'm using it in my projects but this would add an additional dependency so you did right to drop it.

concerning FileInfo file I did used as I've read https://stackoverflow.com/questions/1263957/why-is-mime-content-type-deprecated-in-php during my research

the only issue I'm seeing with your version is that the fallback code is very limited (but great :+1: ). have you tested with let's say, files like MP4, MKV or any generic mime type that some users may use?

Jiab77 commented 5 years ago

just tested on linux with local files like (mkv, mp4, mp3, avi), it worked. also tested with web content, it worked too! I've used the version from @ikkez

Jiab77 commented 5 years ago

@ikkez, the test over FileInfo is useless if your using mime_content_type no?

line: 4

if (extension_loaded('fileinfo'))
         $mime=mime_content_type($file);
...

what do you think about?

if (is_file($file) && is_readable($file)) {
    // physical files
    $mime=mime_content_type($file);
    if(empty($mime) && !preg_match('/^win/i',PHP_OS))
        $mime=trim(exec('file -bi '.escapeshellarg($file)));
}
xfra35 commented 5 years ago

One thing though: the new way requires to have a file physically available somewhere,.. before we just used the file suffix from its name or path, that should be kept in mind.

What about keeping both methods: the old and the new one?

ikkez commented 5 years ago

@ikkez, the test over FileInfo is useless if your using mime_content_type no?

@Jiab77 : that function is part of the FileInfo extension, so I think the test is reasonable... at least mime_content_type doesnt work when FileInfo is not enabled... I've tested it.

the fallback code is very limited

@Jiab77 : yes perhaps, but good enough for "common" files.. also we can extend the map of course.. though the magic bytes trick only works for binary files.. as ASCII files are basically just plain text, you cannot really determine the mime type without having a look at the file extension again (i.e. css, xml, js files, etc.) .. ..so especially for building own files and sending those to the client, I can imagine that it would be useful to keep a simple file-extension <> mime-type map for internal usage as @xfra35 suggested. Maybe an additional attribute, like inspect, guess or trusted that than switches between old method and new one?!

NP: and mime_content_type is not deprecated.. scroll a bit down in your SO answer, and you see some more answers about that.. also it's not marked as deprecated in the official php docs.

bjugan commented 5 years ago

Can a solution be to keep the old method and make an additional new method that use the old method as a first check before it continue and do the true mime detection? That way files with wrong extension would be "stopped" before it's loaded even if you use the true mime detection method.

KOTRET commented 5 years ago

i've already noted my 50 cents in https://github.com/bcosca/fatfree/pull/1139, so no need to repeat.

I'm against inspecting files by reading them. If really needed, this should be offered as separate function, but there are already several mime-libraries out there that do this for you, so no need to reinvent the wheel.

@bjugan the proposal does not make any sense. A mixed function is always flawed: either you inspect the file to read the true mime or you do it by extension. This is all about performance ↔ secure detection.

bjugan commented 5 years ago

@KOTRET Sorry, I agree it doesn't make sense. I was hung up thinking of this as a "verify mime" method and not a "get mime" method.

Anyway, I think it would be nice to have both methods as proposed by xfra35. Then I can combine them in my own "verify file type" method where I first use the old method to filter out files with wrong extension before I continue and load it to do the true mime detection if it pass the extension check (or is this bad as well?)

KOTRET commented 5 years ago

... where I first use the old method to filter out files with wrong extension before I continue and load it to do the true mime detection if it pass the extension check

that would mean that you "allow" all to the framework known mimes at first and then check the content. Probably this is not what you want, as you would like to define your own list of allowed extensions. So i cannot imagine any combination of both methods that makes sense.

Jiab77 commented 5 years ago

So who's in favor to update the actual mime method? IMO I prefer "get mime type" than "verify mime type" to refer to what said @bjugan and so I'm against the actual method which just check the mime type based on the file extension. The version provided by @ikkez cover the need to avoid the complete file download to get the mime type and this is, I think, the best option.

Maybe the best would also to create two separate functions as suggested by @xfra35 in order to avoid breaking existing applications that rely on the behavior of the actual method. Even if I'm very sure that the new method would not impact that much.

Jiab77 commented 5 years ago

To @ikkez :

@Jiab77 : that function is part of the FileInfo extension, so I think the test is reasonable... at least mime_content_type doesnt work when FileInfo is not enabled... I've tested it.

so why it worked on my limited system (I mean I'm actually mostly using PHP embedded server and CLI, gd, json..) I'll check if I have the FileInfo extension enabled.

@Jiab77 : yes perhaps, but good enough for "common" files.. also we can extend the map of course.. though the magic bytes trick only works for binary files.. as ASCII files are basically just plain text, you cannot really determine the mime type without having a look at the file extension again (i.e. css, xml, js files, etc.) .. ..so especially for building own files and sending those to the client, I can imagine that it would be useful to keep a simple file-extension <> mime-type map for internal usage as @xfra35 suggested. Maybe an additional attribute, like inspect, guess or trusted that than switches between old method and new one?!

Yeah I agree if you consider that the static mapping would just be used as fallback and can be updated from upper release or pull requests if needed.

NP: and mime_content_type is not deprecated.. scroll a bit down in your SO answer, and you see some more answers about that.. also it's not marked as deprecated in the official php docs.

Yep I've read the whole page, this is why I was okay with all your changes and suggestions.

bjugan commented 5 years ago

For the record: My idea was not to replace the "get mime type" method with a verify method, but to keep the old method and add this new method as a seperate one so I myself could make something like this:

class MyWeb extends Web
...
function verify_mime(string $file, array $allowed_mimes): bool {
    $key = array_search($this->mime($file), $allowed_mimes); // Extension check
    if ($key !== false && $this->mime_secure($file) === $allowed_mimes[$key]) return true; // Mime check if file pass extension check
    return false;
}
...
$web->verify_mime($file, array('image/jpg', 'image/png'));

But please forget about this idea! I'm a novice and supports whatever you guys end up to decide :-)

Jiab77 commented 5 years ago

@bjugan I could not say you're a novice and myself an expert, certainly not but as a security researcher too and developer, I can't understand your point to use a weak method because it's easier to use than just adapt your code to the new method and so increase the security level you got with your own code.

If this is clearly explained in the changelog or the release note that the new version may contain a breaking change over the mime type detection, this should not be an issue if everybody read it before updating their project dependencies.

To finish, if I refer to your code posted in your message, you should consider to first validate file extensions you want to allow the use then validate the mime type, no need to have two methods inside your verify_mime() function.

class MyWeb extends Web
...
function verify_mime(string $file, array $allowed_extensions, array $allowed_mimes): bool {
        $file_info = pathinfo($file);
        $file_ext = strtolower($file_info['extension']);
    $valid_extension = array_search($file_ext, $allowed_extensions); // Extension check
        // Mime check if file pass extension check
    return ($valid_extension !== false && array_search($this->mime($file), $allowed_mimes);
}
...
$web->verify_mime($file, array('jpg', 'png', 'gif'), array('image/jpg', 'image/png', 'image/gif'));

Anyway, now there is only the project leaders that can decide what to do with my proposal or the one of @ikkez.

bjugan commented 5 years ago

I hoped my previous comment was my last one, but, sorry to everyone, I have to comment once more.

I can't understand your point to use a weak method because it's easier to use than just adapt your code to the new method and so increase the security level you got with your own code.

The two reasons I started to think about this:

  1. The performance concern for the new method. Assuming most unwanted files have wrong extension (and not only wrong mime) the extension test could filter them out before continuing to the more demanding mime test. Anyway, the saving is minimal because most files are wanted files, and with ikkez proposal the performance is not an big issue i guess. Therefor this point is not valid anymore.
  2. Let's say you want to accept jpg, gif or png, but you don't want to accept a file with jpg extension and png mime. My verify method (not tested!) should stop this file, your version of my method won't. Can this be a security problem? I have no idea, probably not.

If this is clearly explained in the changelog or the release note that the new version may contain a breaking change over the mime type detection, this should not be an issue if everybody read it before updating their project dependencies.

Agree. And this is not a breaking change for my project.

Anyway, now there is only the project leaders that can decide what to do with my proposal or the one of @ikkez.

Yes 😄

Jiab77 commented 5 years ago

@bjugan As of reading the documentation of https://fatfreeframework.com/3.6/web#mime, your idea is the one that match the best with what is explained so I suggest to use the @ikkez version as mime_secure method if you want to keep the actual behavior or you would need to update the documentation if you take the choice to replace the actual code used by the mime method.

ikkez commented 5 years ago

Best of both worlds implemented.. no breaking changes, default behaviour is kept.. and with an additional parameter you can now inspect the file for a more valid mime type if you want so.

Jiab77 commented 5 years ago

@ikkez I know you've closed this issue and got nothing against that choice but I was just wondering if the static types map could be improved by adding / creating new methods to add mode signature when needed without having to fork or modify the locally. Maybe by moving the static map to the F3 memory space so that it could be modified with $f3->set() method or any one that could just update the content. Please, let me know what you think about this idea.

KOTRET commented 5 years ago

adding new mimes should be persistent and not on every request

Jiab77 commented 5 years ago

@KOTRET I never meant that the static mimes mapping should be inconsistent nor defined on every requests... I was just saying that instead of placing the static magic bytes mapping into the method, just move it into the main hive like this $f3->get('MAGIC_BYTES') so that they can be specified from the config file or by $f3->set('MAGIC_BYTES') when the app is started for example.

ikkez commented 5 years ago

actually that is like adding them on every request @Jiab77 ...so no. If you have valuable additions to the map, please send a PR. TY

Jiab77 commented 5 years ago

@ikkez, Ok I see, sorry for the misunderstanding. Your framework performances are really nice, I have no will to ruin them. If I find another way to do it I'll send a PR as recommended. Thanks for your reply.