5324 introduced stricter image uri and mime detection. While this makes sense for e.g. file urls where the extensions are likely correct, I don't think we should check extensions for http urls, as they are not reliable at all.

Some imaginary cases I can think of where the new checks would break:

Query params, e.g. to change the image size
- e.g. http://example.com/myimage.png?width=200
- also common, presigned s3 urls: https://my-bucket.s3-provider.com/img.webp?X-Amz-Algorithm=AWS4-HMAC-SHA256[...]
Hash fragments, often used for cache busting
- e.g. http://example.com/myimage.png#1234
Urls that look like they have an an extension but it's not an extension at all
- e.g. http://example.com/api/user_images?username=lucas.merlin (.merlin would be detected as an extension with our logic)
Urls that have a wrong extension
- e.g. http://example.com/render_html_to_image/my_document.html (which would return my_document rendered as an image)

Some of these cases might be bad api design. If you have full control over the backend that might not be a problem but if you e.g. build a chat app where people can send an image url and the app tries to load the image preview from the external url, you would want to support as many urls as possible.

We could try stripping the query params and fragments when checking for the extensions, but I don't think the extra complexity is worth it, considering that this is optimizing the failure case.

The mime check is probably more reliable, but might not always be true. When uploading an image to s3 and you forget to set the mime type, it will have application/octet-stream. I believe, when showing such an image in a <img> tag, the browser will still show it successfully (but I'm not 100% sure). Since we already have the image downloaded when checking the mime type, we might as well just get the type via image::guess_format.

Also, we aren't even using the mime type when loading the image, instead relying on the image crate to guess the format, meaning currently image has to guess the type twice (once when we call 'image::guess_type' and once more when we actually decode the image).

I think we should at least make this behavior configurable, and turn it off by default.

To Reproduce Steps to reproduce the behavior:

Run cargo run -p egui_demo_app --features image_viewer from the egui repo on the current master
Open the image viewer app
Try opening e.g. https://images.pexels.com/photos/45201/kitty-cat-kitten-pet-45201.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=2 (copied from here: https://www.pexels.com/photo/white-and-grey-kitten-on-brown-and-black-leopard-print-textile-45201/)
Notice how it won't load

Expected behavior Images should load, no matter the url

Additional context relevant pr: #5324

xangelix commented 2 weeks ago

Thank you for well documenting examples that break after the last PR. I should have had stronger wording and concerns in my PR about the stricter checks, and I agree that this scenario is common enough that something should be done to loosen the bounds further.

However, I'm not sure this leaves anything except removing errors on unknown extensions and unknown MIMEs entirely and relying solely on magic bytes with image::guess_format.

I think this is okay, since as you mentioned, this is already being done after loading some bytes as it is now. But of course now we always would have to load bytes before rejecting an image. Thoughts on scrapping MIME/URIs detection entirely for the image crate's image_loader @emilk ?

We could also expand the trait to formally check for magic bytes with all loaders to allow the same weaker support checking and scrap MIME/URI on those too.

lucasmerlin commented 2 weeks ago

But of course now we always would have to load bytes before rejecting an image.

In my opinion that's fine trade-off, if it means loading images from an url is more likely to succeed.

We could also expand the trait to formally check for magic bytes with all loaders

Since the image loaders call Context::load_bytes themselves, I'm not sure if we could add this as part of the trait, at least not with the current design. But I think it'd be great if all image loaders supported magic bytes for format detection. Maybe we could instead update the bytes loader to return a stream? Then we could pass the first couple bytes to image::guess_format and cancel the download if all image loaders detect a format that they don't support. Unfortunately guess_format documentation doesn't state how many bytes are needed are needed to be able to detect all formats, that would be interesting to know.

xangelix commented 2 weeks ago

The magic bytes that image uses are here: https://github.com/image-rs/image/blob/317bc1692c134d750360ca9c8e108a08ff534d93/src/image_reader/free_functions.rs#L127

xangelix commented 2 weeks ago

I was thinking along the lines of adding fn supported_head(bytes: &[u8]) -> bool to the trait, where this would just reference image::guess_format(&bytes).is_err() in the case of image_loader and could be implemented manually for other types.

emilk / egui

Some image urls don't load after #5324 #5341

5324 introduced stricter image uri and mime detection. While this makes sense for e.g. file urls where the extensions are likely correct, I don't think we should check extensions for http urls, as they are not reliable at all.