Open Spikestuff opened 3 months ago
This suggestion runs into the same problems that were raised in #1711.
How does this work if the url returns different results to different requests?
It doesn't. Only using a checksum or switching to uploaded images can defend against that kind of attack.
"it goes into the same kind of bucket as javascript side checks, it's honor system really, but it will filter out well meaning mistakes and lazy abusers, which is going to be most things" --adelikat on Discord
So all we can do is check the url and that it ends with an acceptable extension
So some problems with actually trying to enforce an extension (examples based on real user avatar paths): foo.com/id foo.com/id?format=png foo.com/id?format=png&size=125x125 foo.com/path.png?size=125x125
these are legitimate avatars that meet the site requirements. I could try to look for the extension somewhere in the path, and without the dot, but the more I look for, the less we are actually enforcing. I'm not sure there is much value in this.
When the requirements are only to detect a select few filetypes and extract the image dimensions, most of the problems with server-side processing go away since it's just magic bytes + parsing of integers—foolproof.
[ 0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A, 0x00, 0x00, 0x00, 0x0D, 0x49, 0x48, 0x44, 0x52 ]
followed by the width (u32 BE: [ 0, 0, 0, < 125 ]
) and height (ditto).[ 0xFF, 0xD8, 0xFF, 0xE0, 0x00, 0x10, 0x4A, 0x46, 0x49, 0x46, 0x00 ]
. The subformats look super confusing, but TIL it uses a really clever system somewhat like escape characters, and you need only scan through for one of a few start-of-frame markers such as [ not 0xFF, 0xFF, 0xC0, _, _, _ ]
which will be followed by the height (u16 BE: [ 0, < 125 ]
) and width (ditto).[ 0x47, 0x49, 0x46, 0x38, 0x37 or 0x39, 0x61 ]
followed by the width (u16 LE: [ < 125, 0 ]
) and height (ditto).magick convert
, so I'm not sure what to do about those.According to https://datatracker.ietf.org/doc/html/draft-zern-webp, webp always begins with ASCII "RIFF", followed by the filesize minus 8, followed by "WEBP".
I'm I being overly paranoid that I don't want to do this server side? I have to make a call out to an unknown side, pull the image and process it, only to throw it away. I feel like this opens up a potential attack vector.
No, it's perfectly understandable that you wouldn't want ImageMagick, written in a non-memory-safe language, ingesting untrusted data on the same machine as the site. I'm saying there's another way if you're willing to put in some effort. I would assume the BCL has some way to safely make a GET request and read the first N bytes?
don't assume, please look it up please. I don't think there is a way without making and completing the request in memory, and even then, it is the same vulnerability but with less bytes?
From the profile settings page it says:
So maybe we should peek the url and restrict it and call an error when we get a website url or a non-usable link? (And another to the anti-abuse, but the only way would you know this is by using a mobile.)
Also a way to deal with a fake format link which would bypass the logic we have. Like the server must confirm it's actually an image, and not a deadlink or a
example.com/aurl?q=foo#bypass.png
.