Closed sbulen closed 7 years ago
I've been doing some reading on this, & I think that checkImageContents may need a new set of edits. In my case, the preg_matches are returning false positives, and there are probably better ways to do this today.
A helpful read - especially the response that says all uploaded images are evil: http://stackoverflow.com/questions/6484307/how-to-check-if-an-uploaded-file-is-an-image-without-mime-type
My first thought is to change our minimal validation to use getImageSize(), and the extensive validation to use imageCreateFromString().
This one may be important, as it involves security and it appears to be impacting 2.0.
I'd appreciate feedback before diving in too deep here.
I've tended to use the following, if its any use at all. mime_content_type is just to catch when getimagesize parses an file as a image when it shouldn't do so. You could ensure that the first imageType matches the second from getimagesize if you wanted to be really cautious.
function check_image($filename) {
$imageType = mime_content_type($filename);
if(!in_array($imageType, array('image/gif', 'image/jpeg', 'image/tiff', 'image/bmp', 'image/png'))) {
return false;
}
$imageData = getimagesize($filename);
if(!is_array($imageData)) {
return false;
}
if( $imageData[0] == 0 ) {
return false;
}
if( $imageData[1] == 0 ) {
return false;
}
$imageType = $imageData[2];
if(!in_array($imageType , array(IMAGETYPE_GIF , IMAGETYPE_JPEG ,IMAGETYPE_PNG , IMAGETYPE_BMP))) {
return false;
}
return true;
}
Thanks, that's very helpful. And consistent with the other things I've read.
There is another current thread out there in 2.0 support with the same issue: http://www.simplemachines.org/community/index.php?topic=544243.0
More reading on sanitizing images: http://stackoverflow.com/questions/3644138/secure-user-image-upload-capabilities-in-php
Everything I've read suggests that the best solution is to make a copy of the image, in a way that will not copy all the tags, metadata, etc.
@sbulen Can you attach a image which is failing to this issue?
I'll have a look and see if I can work out where its falling over.
Will do. I'm out of town at the moment & don't have access to those pics, so this will take a day or two.
FYI, there are images as well as analysis from the developer nend at the 2.0 thread referenced above: http://www.simplemachines.org/community/index.php?topic=544243.msg3915301#msg3915301
Multiple parts of the preg_match are failing. I really suspect that we need to think more along the lines of the getImageSize technique described above, and possibly re-encoding where further security is needed.
More helpful info here on intrusion detection here: https://www.trustwave.com/Resources/SpiderLabs-Blog/Hiding-Webshell-Backdoor-Code-in-Image-Files/
I've been reading thru the routines in subs-graphics, and they appear to be doing EVERYTHING I've read as best practice already. Re-encoding when desired (an option under Admin | Attachment Settings), and checking for bad strings. It already checks getImageSize. It already renames the uploaded files. (I've always been impressed how security conscious SMF is & this just reaffirms that...)
So... I'm now wondering if it is simply checking a bit too much in checkImageContents. For example, where I've read it should check for certain binary values, they normally say only to check the header in the first 100 bytes, where checkImageContents checks the entire file. And it is common to have a valid URL in your EXIF info, so the http string check might be a bit much - especially for those who have not requested extensive checks.
I'm now thinking there are only minor tweaks needed to checkImageContents, along the lines of:
I'm going to read a bit more before proposing changes.
Input welcome.
Some research... Each file that fails seems to fail on a different edit, even on the 'normal' edits. I can copy excerpts of the files that fail below.
The 'normal' term is: ~(iframe|(?<!cellTextIs)html|eval|body|script\W|[CF]WS[\x01-\x0C])~i
Nothing fails for: iframe, eval, script\W
(?<!cellTextIs)html: s$«5c#ªÌV%¾—¸–ƒb5ÒRE¬f¿bÖ´ ®ý`‹éÓ·üË]r˜áqy›q/¢ÖN¡wI´ž±Z‘Û?>ªÚ¬§N›DoÀájEÇLú-¬)ºyê#†’x[r·ô3sF†³%¡”;ïý³öqx\ŒæÓL—1ÄO@abÏÖõÑeôðY3KÌ,kfMµm}æZœâ]V%©‰Âaê$ú]È OļìhTmL;]V¥C¯g>ôÇÓWÝSoÒL˜†Ž*…MyõÑ{[Iÿ êM]ß󪩂u ãÿ À#sês£ñiSìjÇžçöU¦2Ò@ègÔå.bÈâ³Z~2¯–eÐP>÷ù—‡ºõð‰–ÞË“¬GWp¬J±L,ŠY" @°‚,Vz^Všx+ èˆq
body: HGÞA(1W¨£a‹\:£Ë'’ƒ¾¤A±ýÔ{x!DFáÕ=’¨©o“ÙD½ç•HûNÈç×;¨EÈ:#{û bOdyƒ²[ßÔ#Ì=“FÇšÞ¡Øz$dö@“9 £bÙÙÎÉ—ŽÈÞ;+¡ÿÓù×–“›Ñ]T•.ZtU°#`Víî&©ÙÒYjݽµ4mHgp™ŒvÛKodЧÊöH°uZ*ÒÚSB,#Êj¾»#jhÚ(vG”ûrŠ²šyB¸KÉ´õFÔÒíŸÉƒV¨¤Ñ¶o%§¢<–ôFÛL7ºhÛ/<†{'·©Mß«´ éDzӶÐ[X)¡›õfž‰~¬ÎËVÔmMdýY½RýU½–ͨÛÙ4méÙ/ÔÛÙlÛhÛI£lª7²©·²ÜÑJhÛêmì‘Ñ7²ÞZ•&†Ô›Ù¨·²ß¶ÐZšÿ Ô€èÔZº;p‘bqз²¨it¶ ±7õ¤ttöå¨9¨³ý—P5-QÊÿ g„¿ÙôºÛi0-AÈÿ g‚—û=uöR6Ò9ðô¿Ùë³±"ÀƒþÏGû=v¶¤iäðâþ z%þÏ=kfSòÂyC )³Êîyv‚Äò8¨;„~ WpÆ/„y}SÈáúƒ—wb[òxpÿ P(ýA˹å„l äðáþ¢RýEË»±XO'‡õ¿RweßòÀAŒ<?ú“‚GFîËÐyMè%½“Èóߪ;²_ª8/Cä·²F›£ÏþªîÊ'Jå輆”y
[CF]WS[\x01-\x0C] (the binary character is an ACK, displayed here as an ):
¸ T–v* ÁC¿Ü¡¢ÓIÛnQQvmEßr‡·Ø)V”@B–þ—ö¶šµ’^à7Jvy
La/4¦ §TSlÀÚÍÏàáèß.:“O‰84X]È †Ø#-I1**Fws**AMS
Àû€ç¥À|”7›à»ÀCØ¢ªiD&ìP5`)>@n0Q.öǺ InPÒì é¾ )I4M¤ªx i¼ ÿ ˆ¨'Ü
@Illori - Here is one of the complete files I couldn't post on SM.org due to size. This sample file has the Fws{ACK}
I can provide more sample shots if needed. I'm starting to think the preg_match doesn't work well anymore.
(Please forgive the soft focus - handheld shot from the bleachers + the lighting was poor...)
@tinoest - image copied above.
One minor issue with the 'extended' search string: |\\<\\?|\\<%| is equivalent to |\\<|
The reasoning: The ? means zero or one. At the end of this expression, it basically means you don't have to have the last \ to get a hit. In other words, you're really just looking for \<. Further, since \< is a match, then \<% is redundant. This is all easily confirmed in any regex tester, as well as with real data.
\<, by the way, gets MANY hits in almost all of my .jpgs. It's just too short a string.
We're losing a statistical battle with the contents of these .jpgs...
Thanks for the image I'm yet to get a chance to check it.
If you double escape the ? Does it solve the issue. I think \<\ \ ? Would mean it looks for a <?
Ignore the space between the \ \ markup is removing a \
Actually the current regex only finds a <? Or <% when I try it
Sorry the markup deleted all of the double \s for me. I tried to clean it up a bit & failed. What I tried to say is:
|\\<\\?|\\<%| is equivalent to |\\<| and will return hits on all instances of \<
I do not know which strings they are trying to get a hit on there. I believe it is a form of inline code.
My current thinking is that we should not even do the preg_matches if the image is getting re-encoded. I believe the re-encoding is the proper safety measure.
We're losing a statistical battle. In my mind, the image encoding is like the thousand monkeys typing away & accidentally writing Shakespeare's works. Only they just have to get a hit on something like 'body' or 'html'... In a big enough .jpg, odds are pretty good that happens.
This is also an issue with 2.0, and is causing frustration for folks with photography forums.
When the code to originally do this was added in a security release for 1.1 and 2.0, an idea was pitched that we could re-encode all images uploaded, but it was thrown out since PHP didn't have handling for all the different image types we could encounter and feasibly support with the PHP versions we had to maintain and memory requirements. Since we have PHP 5.3 now as the min, we could take a look at it again and see what we could do. Ideally I think we need to cover jpg, gif, and png. Others would be a plus, especially if we could support gifv formats.
The false positives have been known since then and improved in a later security release, but none the less we know we will still hit false positives. Re-encoding seemed like the only way out of this to clear out any bad stuff and possibly strip metadata that could contain bad stuff.
Reading the notes from the frustrated photographers on the support site, they do not want re-encoding, and they do not want these random rejections either.
Unfortunately, they can't have it both ways... To NOT re-encode, and to NOT do the string searches leaves the forums wide open to image-based security attacks.
In my mind, after reading up on this the last two weeks, I believe the solution is to only do the preg_matches if the images are not being re-encoded. You have to pick some degree of protection. I don't think the preg_matches are necessary when the image is being re-encoded.
@tinoest - you are correct, the regex does in fact properly catch <? and <%.
@sbulen - The image you posted, which contains alot of meta data, would fail on many of those checks in SMF. I think that you're right in that the checks are outdated and need to be reviewed.
I would think that moving the attachments to a different directory outside of the document root is something that should be done anyway, and then have a image proxy which can only read the data and never execute it. So even if it is comprimised it would never get run.
Obviously checks should be run where applicable, maybe enabling different levels of security would be the way forward, so that the forum owner can choose how secure they want to make their site.
These preg_matches are failing in the image encoding itself, not in the exit data. I haven't seen a failure on the exit data yet. (Because I'm not trying to stuff code in there...)
I picture the image encoding, which is basically producing random bytes of data, as the proverbial thousand monkeys at typewriters. Sooner or later, they will hit words.
A four letter word will come up pretty frequently.
I have about 1 in 5-10 of my images fail these security checks. Large, 2mb+ images are losing a statistical battle.
The problem is that this renders SMF extremely problematic for those trying to share photos. Almost useless. Hence the frustration on the support site.
Other smf forks share this code and have the same issues. I have posted this issue elsewhere and triggered discussion and planned changes. I'll monitor that for ideas.
(As a noob here, I must say that I feel a little like a child of divorced parents...)
@sbulen it is like that quite a bit on here sometimes.
I personally think a separate directory and a proxy is the best way, however most people probably don't have that level of control on their hosting.
So second option is re-encode the image.
Third is the preg_match checks.
Maybe the code gets changed to offer the admin the choice of those three options?
Savvy users can already place the attachment directory wherever they want, right?
I'm having an issue with the preg_matches for the reason supplied above.
I get that, they should be a last resort option imo as they are outdated now. But not my decision
I don't understand why this problem existing, why is php code execute in picture?
another point from database side is very very very bad idea to use the picture function from board like smf, because they safe the data as binary in the database, what is very very bad behavior in any form.
Albert,
Historically, this was a form of hacking a site.
The way I think of it, is that all traffic in and out of your website is basically serial in nature. Uploading a photo is basically uploading a huge string. If you put some in-line php or javascript in that string, it will in fact execute. (In a way, it is similar to sql injection...)
A malicious user can do a lot of damage that way.
This also means that historically, allowing the upload of images to any website was a MAJOR vulnerability.
So this code exists. But it prevents photographers from uploading a lot of their photos.
Most of these edits should not exist anymore.
The difficult question is how safe do we need to make it, and which vulnerabilities are in fact current and valid today.
Shawn
And why is the "string" is execute? And why is a good idea to safe the photos in smf?
I'm still figuring this out myself...
My understanding is that the code gets executed a few ways. First, inline when images are displayed. Second, once planted, hackers can invoke it indirectly. (Follow the links above...)
Lots of folks want to use SMF to share photos.
I learned thru another forum that most of these attack vectors are obsolete. Admins have to go out of their way to enable them. So the question is: what stays?
All attachments are stored in the files system, not the db. The smf attachments table just keeps track of their location & basic attributes
Here's another explanation: https://de.slideshare.net/mobile/saumilshah/hacking-with-pictures-hacklu-2014
So... After lots of research, I recommend:
I think these changes should be reflected on the 2.0 site as well. This should make some of our photography oriented users a bit more happy.
I'll test out some changes over the next day or two & submit a PR.
This, I think, will be the short-to-midterm solution. Longer term, I will just keep doing research. If I run into a better library or technique, I'll share.
More info on parallel thread here: https://github.com/elkarte/Elkarte/issues/2874
In 2.1, I have received invalid security warnings uploading .jpgs. Someone also reported this on the support site for 2.0 the other day: http://www.simplemachines.org/community/index.php?topic=552034.0
I traced my 2.1 issue to the checkImageContents call in subs-graphics. It's finding a hit on the preg_match, even the lighter one without extensive security checks selected. The 2.0 issue reported that option was de-selected as well.
My jpgs were taken on a nikon, and edited in the latest version of lightroom.