Closed Jiab77 closed 5 years ago
I might also improve the detection if the $file
given is a gd
image.
See line: https://github.com/bcosca/fatfree-core/pull/270/files#diff-bab800cb64eba82163ab93482194adc8R65
I checked it.. it works on windows, yes.
One thing though: the new way requires to have a file physically available somewhere,.. before we just used the file suffix from its name or path, that should be kept in mind.
Nevertheless, that loading thing hurt in my eyes a bit, so I tried to "optimized" the proposal a bit more:
function mime($file) {
if (is_file($file) && is_readable($file)) {
// physical files
if (extension_loaded('fileinfo'))
$mime=mime_content_type($file);
elseif(!preg_match('/^win/i',PHP_OS))
$mime=trim(exec('file -bi '.escapeshellarg($file)));
}
else {
// remote and stream files
if (ini_get('allow_url_fopen') && ($fhandle=fopen($file,'rb'))) {
// only get head bytes instead of whole file
$bytes=fread($fhandle,20);
fclose($fhandle);
} elseif (($response=$this->request($file,['method' => 'HEAD']))
&& preg_grep('/HTTP\/\d\.\d 200/',$response['headers'])
&& ($type = preg_grep('/^Content-Type:/i',$response['headers']))) {
// get mime type directly from response header
return preg_replace('/^Content-Type:\s*/i','',array_pop($type));
} else {
// load complete file
$bytes=file_get_contents($file);
}
if (extension_loaded('fileinfo')) {
// get mime from fileinfo
$finfo=finfo_open(FILEINFO_MIME_TYPE);
$mime=finfo_buffer($finfo,$bytes);
} elseif($bytes) {
// magic number header fallback
$map=[
'\x64\x6E\x73\x2E'=>'audio/basic',
'\x52\x49\x46\x46.{4}\x41\x56\x49\x20\x4C\x49\x53\x54'=>'video/avi',
'\x42\x4d'=>'image/bmp',
'\x42\x5A\x68'=>'application/x-bzip2',
'[ -~]+$'=>'text/plain',
'\x07\x64\x74\x32\x64\x64\x74\x64'=>'application/xml-dtd',
'\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1'=>'application/msword',
'\x50\x4B\x03\x04\x14\x00\x06\x00'=>'application/msword',
'\x0D\x44\x4F\x43'=>'application/msword',
'GIF\d+a'=>'image/gif',
'\x1F\x8B'=>'application/x-gzip',
'\xff\xd8\xff'=>'image/jpeg',
'\x49\x46\x00'=>'image/jpeg',
'\xFF\xFB'=>'audio/mpeg',
'\x49\x44\x33'=>'audio/mpeg',
'\x00\x00\x01\xBA'=>'video/mpeg',
'\x4F\x67\x67\x53\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00'=>'audio/vorbis',
'\x25\x50\x44\x46'=>'application/pdf',
'\x89PNG\x0d\x0a'=>'image/png',
'.{4}\x6D\x6F\x6F\x76\x'=>'video/quicktime',
'\x53\x49\x54\x21\x00'=>'application/x-stuffit',
'\x43\x57\x53'=>'application/x-shockwave-flash',
'\x1F\x8B\x08'=>'application/x-tar',
'\x49\x20\x49'=>'image/tiff',
'\x52\x49\x46\x46.{4}\x57\x41\x56\x45\x66\x6D\x74\x20'=>'audio/wav',
'\xFD\xFF\xFF\xFF\x20\x00\x00\x00'=>'application/vnd.ms-excel',
'\x50\x4B\x03\x04'=>'application/x-zip-compressed'
];
foreach ($map as $key=>$val)
if (preg_match('/^'.$key.'/',substr($bytes,0,128)))
return $val;
}
}
return (isset($mime) && !empty($mime) ? $mime : 'application/octet-stream');
}
The main point is, I've added a feature that only loads the first bytes of the remote/string files, and not the whole file at all to check the mime type. Since the mimetype is something in the file header, I thought it would be nice to save some bandwidth here, and that makes it indeed much faster in my tests. If that doesnt work I've added a HEAD request fallback that gathers the mime-type from the Content-Type of remote files.
I now just used the mime_content_type
function and dropped the finfo_file
conditional part, since both techniques are from the same FileInfo
extension.. so it doesnt matter (ref). I guess mime_content_type
is just a shortcut for the other more advanced finfo_file
method... Also dropped the getimagesizefromstring
technique, as it requires the GD
plugin and only works for images.. instead, in case the FileInfo
extension is not avaiable, I've added a check to match the magic bytes in the files to get their proper mimetypes.. thats what mime_content_type
is actually doing just a bit more convenient than the manual approach with a regex map of course... but it works (ref1, ref2).. Ah yeah and if FileInfo is not available and you're on "not windows", I'll now use a shell command to get the file mime type instead for physical files.
Thoughts? 😄
@ikkez your version is much better than mine :grin: . you're right about the gd
requirement, it was logical to me to use getimagesizefromstring
as fallback as I'm using it in my projects but this would add an additional dependency so you did right to drop it.
concerning FileInfo file
I did used as I've read https://stackoverflow.com/questions/1263957/why-is-mime-content-type-deprecated-in-php during my research
the only issue I'm seeing with your version is that the fallback code is very limited (but great :+1: ). have you tested with let's say, files like MP4, MKV or any generic mime type that some users may use?
just tested on linux with local files like (mkv, mp4, mp3, avi), it worked. also tested with web content, it worked too! I've used the version from @ikkez
@ikkez, the test over FileInfo
is useless if your using mime_content_type
no?
line: 4
if (extension_loaded('fileinfo'))
$mime=mime_content_type($file);
...
what do you think about?
if (is_file($file) && is_readable($file)) {
// physical files
$mime=mime_content_type($file);
if(empty($mime) && !preg_match('/^win/i',PHP_OS))
$mime=trim(exec('file -bi '.escapeshellarg($file)));
}
One thing though: the new way requires to have a file physically available somewhere,.. before we just used the file suffix from its name or path, that should be kept in mind.
What about keeping both methods: the old and the new one?
@ikkez, the test over
FileInfo
is useless if your usingmime_content_type
no?
@Jiab77 : that function is part of the FileInfo extension, so I think the test is reasonable... at least mime_content_type doesnt work when FileInfo is not enabled... I've tested it.
the fallback code is very limited
@Jiab77 : yes perhaps, but good enough for "common" files.. also we can extend the map of course.. though the magic bytes trick only works for binary files.. as ASCII files are basically just plain text, you cannot really determine the mime type without having a look at the file extension again (i.e. css, xml, js files, etc.) ..
..so especially for building own files and sending those to the client, I can imagine that it would be useful to keep a simple file-extension <> mime-type map for internal usage as @xfra35 suggested.
Maybe an additional attribute, like inspect
, guess
or trusted
that than switches between old method and new one?!
NP: and mime_content_type
is not deprecated.. scroll a bit down in your SO answer, and you see some more answers about that.. also it's not marked as deprecated in the official php docs.
Can a solution be to keep the old method and make an additional new method that use the old method as a first check before it continue and do the true mime detection? That way files with wrong extension would be "stopped" before it's loaded even if you use the true mime detection method.
i've already noted my 50 cents in https://github.com/bcosca/fatfree/pull/1139, so no need to repeat.
I'm against inspecting files by reading them. If really needed, this should be offered as separate function, but there are already several mime-libraries out there that do this for you, so no need to reinvent the wheel.
@bjugan the proposal does not make any sense. A mixed function is always flawed: either you inspect the file to read the true mime or you do it by extension. This is all about performance ↔ secure detection.
@KOTRET Sorry, I agree it doesn't make sense. I was hung up thinking of this as a "verify mime" method and not a "get mime" method.
Anyway, I think it would be nice to have both methods as proposed by xfra35. Then I can combine them in my own "verify file type" method where I first use the old method to filter out files with wrong extension before I continue and load it to do the true mime detection if it pass the extension check (or is this bad as well?)
... where I first use the old method to filter out files with wrong extension before I continue and load it to do the true mime detection if it pass the extension check
that would mean that you "allow" all to the framework known mimes at first and then check the content. Probably this is not what you want, as you would like to define your own list of allowed extensions. So i cannot imagine any combination of both methods that makes sense.
So who's in favor to update the actual mime
method? IMO I prefer "get mime type" than "verify mime type" to refer to what said @bjugan and so I'm against the actual method which just check the mime type based on the file extension. The version provided by @ikkez cover the need to avoid the complete file download to get the mime type and this is, I think, the best option.
Maybe the best would also to create two separate functions as suggested by @xfra35 in order to avoid breaking existing applications that rely on the behavior of the actual method. Even if I'm very sure that the new method would not impact that much.
To @ikkez :
@Jiab77 : that function is part of the FileInfo extension, so I think the test is reasonable... at least mime_content_type doesnt work when FileInfo is not enabled... I've tested it.
so why it worked on my limited system (I mean I'm actually mostly using PHP embedded server and CLI, gd, json..) I'll check if I have the FileInfo
extension enabled.
@Jiab77 : yes perhaps, but good enough for "common" files.. also we can extend the map of course.. though the magic bytes trick only works for binary files.. as ASCII files are basically just plain text, you cannot really determine the mime type without having a look at the file extension again (i.e. css, xml, js files, etc.) .. ..so especially for building own files and sending those to the client, I can imagine that it would be useful to keep a simple file-extension <> mime-type map for internal usage as @xfra35 suggested. Maybe an additional attribute, like
inspect
,guess
ortrusted
that than switches between old method and new one?!
Yeah I agree if you consider that the static mapping would just be used as fallback and can be updated from upper release or pull requests if needed.
NP: and
mime_content_type
is not deprecated.. scroll a bit down in your SO answer, and you see some more answers about that.. also it's not marked as deprecated in the official php docs.
Yep I've read the whole page, this is why I was okay with all your changes and suggestions.
For the record: My idea was not to replace the "get mime type" method with a verify method, but to keep the old method and add this new method as a seperate one so I myself could make something like this:
class MyWeb extends Web
...
function verify_mime(string $file, array $allowed_mimes): bool {
$key = array_search($this->mime($file), $allowed_mimes); // Extension check
if ($key !== false && $this->mime_secure($file) === $allowed_mimes[$key]) return true; // Mime check if file pass extension check
return false;
}
...
$web->verify_mime($file, array('image/jpg', 'image/png'));
But please forget about this idea! I'm a novice and supports whatever you guys end up to decide :-)
@bjugan I could not say you're a novice and myself an expert, certainly not but as a security researcher too and developer, I can't understand your point to use a weak method because it's easier to use than just adapt your code to the new method and so increase the security level you got with your own code.
If this is clearly explained in the changelog or the release note that the new version may contain a breaking change over the mime type detection, this should not be an issue if everybody read it before updating their project dependencies.
To finish, if I refer to your code posted in your message, you should consider to first validate file extensions you want to allow the use then validate the mime type, no need to have two methods inside your verify_mime()
function.
class MyWeb extends Web
...
function verify_mime(string $file, array $allowed_extensions, array $allowed_mimes): bool {
$file_info = pathinfo($file);
$file_ext = strtolower($file_info['extension']);
$valid_extension = array_search($file_ext, $allowed_extensions); // Extension check
// Mime check if file pass extension check
return ($valid_extension !== false && array_search($this->mime($file), $allowed_mimes);
}
...
$web->verify_mime($file, array('jpg', 'png', 'gif'), array('image/jpg', 'image/png', 'image/gif'));
Anyway, now there is only the project leaders that can decide what to do with my proposal or the one of @ikkez.
I hoped my previous comment was my last one, but, sorry to everyone, I have to comment once more.
I can't understand your point to use a weak method because it's easier to use than just adapt your code to the new method and so increase the security level you got with your own code.
The two reasons I started to think about this:
If this is clearly explained in the changelog or the release note that the new version may contain a breaking change over the mime type detection, this should not be an issue if everybody read it before updating their project dependencies.
Agree. And this is not a breaking change for my project.
Anyway, now there is only the project leaders that can decide what to do with my proposal or the one of @ikkez.
Yes 😄
@bjugan As of reading the documentation of https://fatfreeframework.com/3.6/web#mime, your idea is the one that match the best with what is explained so I suggest to use the @ikkez version as mime_secure
method if you want to keep the actual behavior or you would need to update the documentation if you take the choice to replace the actual code used by the mime
method.
Best of both worlds implemented.. no breaking changes, default behaviour is kept.. and with an additional parameter you can now inspect the file for a more valid mime type if you want so.
@ikkez I know you've closed this issue and got nothing against that choice but I was just wondering if the static types map could be improved by adding / creating new methods to add mode signature when needed without having to fork or modify the locally. Maybe by moving the static map to the F3 memory space so that it could be modified with $f3->set()
method or any one that could just update the content.
Please, let me know what you think about this idea.
adding new mimes should be persistent and not on every request
@KOTRET I never meant that the static mimes mapping should be inconsistent nor defined on every requests... I was just saying that instead of placing the static magic bytes mapping into the method, just move it into the main hive like this $f3->get('MAGIC_BYTES')
so that they can be specified from the config file or by $f3->set('MAGIC_BYTES')
when the app is started for example.
actually that is like adding them on every request @Jiab77 ...so no. If you have valuable additions to the map, please send a PR. TY
@ikkez, Ok I see, sorry for the misunderstanding. Your framework performances are really nice, I have no will to ruin them. If I find another way to do it I'll send a PR as recommended. Thanks for your reply.
Issue opened over fatfree project: https://github.com/bcosca/fatfree/issues/1138