image/jpeg: add options to partially decode or tolerantly decode invalid images?

pranavraja commented 9 years ago

go version devel +ce43e1f Mon Apr 13 23:27:35 2015 +0000 linux/amd64

Attempted to use jpeg.Decode on the below image: https://streamcoimg-a.akamaihd.net/000/340/810/9ae536dd97d2d92fc17a6590509a51c0.jpg

Expected the image to decode successfully, as it displays in a browser.

Actual result: invalid JPEG format: short Huffman data

nigeltao commented 9 years ago

It seems to decode fine for me, on tip. I know your Go version is only a few days old, but can you "git sync" and re-try?

pranavraja commented 9 years ago

Strange, I updated to go version devel +e5b7674 Wed Apr 15 02:28:53 2015 +0000 linux/amd64, and am still getting the same error. Here's my test program:

package main

import (
        "image/jpeg"
        "net/http"
        "fmt"
)

func main() {
        res, err := http.Get("https://streamcoimg-a.akamaihd.net/000/340/810/9ae536dd97d2d92fc17a6590509a51c0.jpg")
        if err != nil {
                panic(err)
        }
        defer res.Body.Close()
        img, err := jpeg.Decode(res.Body)
        if err != nil {
                panic(err)
        }
        fmt.Println(img.Bounds())
}

And here's the output:

panic: invalid JPEG format: short Huffman data

goroutine 1 [running]:
main.main()
        /usr/share/fix-images/check.go:17 +0x182

minux commented 9 years ago

Your program works for me with tip. Are you sure you're using Go tip?

Add a fmt.Println(runtime.Version()) to your program and make sure you're not accidentally using an older version.

pranavraja commented 9 years ago

Added fmt.Println(runtime.Version())

devel +e5b7674 Wed Apr 15 02:28:53 2015 +0000
panic: invalid JPEG format: short Huffman data

goroutine 1 [running]:
main.main()
        /usr/share/fix-images/check.go:19 +0x28a

Anyway, as long as this is fixed on tip I'm happy to close this.

tenorok commented 8 years ago

Hello! I try to use golang v1.6.2 and with image of @pranavraja everything okay, but with image http://bubble.ru/system/magazines/mg_n11_01_original.jpg I have the same error. Please, can you explain what is problem?

nigeltao commented 8 years ago

I'm not sure what the problem is, but it's not a regression: I see the same error on the stable release (Go 1.6). We're in code freeze for the upcoming 1.7 release; I'll take a look at it once the tree opens again for 1.8.

tenorok commented 8 years ago

I'm uploaded image with problem to github for safety, because hoster can remove her. https://cloud.githubusercontent.com/assets/1322855/16350692/f67e1266-3a68-11e6-96a1-205b396b1ace.jpg

cctse commented 8 years ago

I met the same problem. It raise "OSError: image file is truncated (53 bytes not processed)" when I use python to download a jpeg file, the binary data can be save to file with ImageFile.LOAD_TRUNCATED_IMAGES=True, but the truncated pixel will set to black. with go, all pixel process well and not truncate but it will raise "invalid JPEG format: short Huffman data" when decode it. the jpeg show full in safari but truncated in chrome. maybe go need truncate the jpeg as python does

elektroid commented 7 years ago

I have the same issue with pictures from my phone, I attach one in case it helps https://cloud.githubusercontent.com/assets/6634115/19217480/2c442a36-8ddc-11e6-8392-4b45725b49ef.jpg

$ go version go version go1.7.1 freebsd/amd64

mattn commented 7 years ago

package main

import (
    "fmt"
    "image/jpeg"
    "net/http"
)

func main() {
    urls := []string{
        "https://streamcoimg-a.akamaihd.net/000/340/810/9ae536dd97d2d92fc17a6590509a51c0.jpg",
        "https://cloud.githubusercontent.com/assets/6634115/19217480/2c442a36-8ddc-11e6-8392-4b45725b49ef.jpg",
    }
    for _, u := range urls {
        res, err := http.Get(u)
        if err != nil {
            panic(err)
        }
        defer res.Body.Close()
        img, err := jpeg.Decode(res.Body)
        if err != nil {
            panic(err)
        }
        fmt.Println(img.Bounds())
    }
}

I got fail in the second @elektroid mentioned.

(0,0)-(1920,1080)
panic: invalid JPEG format: short Huffman data

goroutine 1 [running]:
panic(0x5f5c00, 0xc0421a4060)
        c:/dev/go/src/runtime/panic.go:527 +0x1ae
main.main()
        c:/dev/go-sandbox/jpeg.go:22 +0x21d
exit status 2

first.

9ae536dd97d2d92fc17a6590509a51c0.jpg: JPEG image data, Exif standard: [TIFF image data, little-endian, direntries=0], baseline, precision 8, 1920x1080, frames 3

second.

2c442a36-8ddc-11e6-8392-4b45725b49ef.jpg: JPEG image data, Exif standard: [TIFF image data, big-endian, direntries=9, datetime=2016:09:29 20:09:59, GPS-Data, model=Aquaris M5.5, resolutionunit=2, yresolution=155, xresolution=163], baseline, precision 8, 3120x4160, frames 3

elsonwu commented 7 years ago

The same issue +1, some special jpg will cause this problem:

invalid JPEG format: short Huffman data

nigeltao commented 7 years ago

@elektroid I'll try to find some time next week to look at it, but FWIW, I get e-mail for every comment on this issue, and somewhere along the mail pipeline, or in my browser's JPEG decoder, that attachment doesn't look like a valid JPEG. I've attached a screenshot from my mail, where I've added a pink ring to emphasize where it breaks down.

invalid

nigeltao commented 7 years ago

@elsonwu can you give more details than "some special jpg will cause this problem"? Can you attach an example? Do other programs (e.g. web browsers, Photoshop) handle those special JPEGs OK or do they also reject them?

bradfitz commented 7 years ago

@nigeltao, what's the status here?

elektroid commented 7 years ago

I switched to "gopkg.in/gographics/imagick.v1/imagick" hoping it would cope with my improper files but it fails to load them too.

nigeltao commented 7 years ago

Sorry, I didn't find the time to make a detailed investigation, and there have been no recent changes to Go's image/jpeg package, but it sounds like non-Go software is also reporting errors with some or all of these cases.

bradfitz commented 7 years ago

Yeah, but I can open it in Chrome. I thought we tried to match whatever browsers do.

special commented 7 years ago

All of the failing testcases here and others that I've found are truncated. They don't have complete SOS segments, don't contain an EOI, and raise warnings with other decoders. In @elsonwu's case, it's fixed by appending \x00\xff\xd9. The others are missing more data.

Still, there is a bug in that there's no way to decode truncated images, which seem relatively common and are readable with most other decoders.

My first thought is to return a partially decoded image (if there is one) along with the error; see https://github.com/special/go/commit/c7a05f392ca604f6070009364827b04535293974. I don't mind adding docs/tests and submitting that if the approach is ok.

Otherwise, it seems slightly inappropriate to silence legitimate decoding errors, and there's no API for decoding options, so I'm not sure what else to do.

nigeltao commented 7 years ago

We could possibly change jpeg.Decode (and the other image codecs) to return (non-nil Image, non-nil error) with partial results if it encounters an error, although that's unusual in general for functions returning (T, err), and certainly not going to happen for Go 1.8.

As for matching whatever browsers do, what browsers do influences how far down the Postel's law slope we slip, but I'm wary of the slippery slope, and according the JPEG spec, these are invalid images.

jlongman commented 7 years ago

Just did a quick look at this for my reasons and summarized it like this:

sid and nancy : incorrectly terminated - no missing bytes
parking garage : truncated and incorrectly terminated - missing bytes
the guy : strange icc_profile and incorrectly terminated - no missing bytes
russian comic book : incorrectly terminated - no missing bytes

I'm saying the files are not truncated by examining the bottom right block and seeing pixels - as opposed to parking garage which is clearly truncated a couple of lines of blocks early. I didn't check metadata vs lines of pixels however, so whole lines of blocks could be missing.

The ICC_PROFILE may be something that's accepted in JPEG formats, or the format descriptions I was looking at weren't showing the working standard (as opposed to the published standard). It clearly decodes in other decoders in any case.

Anyways, I don't have a solution to your quandary about returning an error and an image, I suppose you could put the decoded image in the error (ack), but my thought is that the issue might better be described as instead of "image/jpeg: Unable to decode valid JPEG image" as "image/jpeg: Unable to decode invalid/truncated JPEG image". Cheers!

korya commented 7 years ago

I have another example of invalid JPEG image. The problem with the file is that the second half of the file is filled with garbage bytes:

$ xxd bad-image.jpg
00000000: ffd8 ffe0 0010 4a46 4946 0001 0100 0001  ......JFIF......
00000010: 0001 0000 ffdb 0043 0005 0304 0404 0305  .......C........
 ... valid JPEG contents ...
0001fff0: 7ca0 574a 75cf 835d 4b12 cffd 9a0e 8fa1  |.WJu..]K.......
00020000: 7e33 1885 9110 0a4f b753 fff7 9de2 be06  ~3.....O.S......
 ... same 16-byte pattern ...
0003f4e0: 7e33 1885 9110 0a4f b753 fff7 9de2 be06  ~3.....O.S......
0003f4f0: 7e33 1885 9110 0a4f b753 fff7 9de2 be06  ~3.....O.S......

image.Decode fails with invalid JPEG format: missing 0xff00 sequence, but the browsers display the image:

korya commented 7 years ago

In regards to supporting invalid JPEG images. I would vote for adding this support sooner by keeping the semantics of jpeg.Decode unmodified and adding a new function to decode potentially invalid JPEGS, e.g. jpeg.TryToDecode. Alternatively, if it is not desirable to add such function to jpeg package, a new experimental package could be added. The package would implement the new semantics of JPEG decoding.

This way people can start using it today in these rare cases when image.Decode fails but it is known that the byte stream is a JPEG file.

By the way, if such package already exists please let me know.

nigeltao commented 7 years ago

I'd rather have a different package instead of adding TryToDecode to the standard library for the rest of Go 1.x's lifetime. As a bonus, such a package wouldn't be bound by the standard library feature freeze that we're currently in.

I don't know if any such package already exists, and I won't have time to make one any time soon. Sorry.

bradfitz commented 7 years ago

Related: #20804 for an invalid GIF that browsers decode, but Go doesn't.

rasky commented 6 years ago

@bradfitz can we change the title of this issue? This is now about doing something for invalid images, not valid ones.

bradfitz commented 6 years ago

@rasky, done.

sheerun commented 5 years ago

Here's another example: __20

dvaldivia commented 5 years ago

still running into this issue in go version go1.11.4 darwin/amd64

zzwx commented 5 years ago

VanGoghStarryNight___ (VCG Wilson / Corbis via Getty Images)

Attempt to decode leads to: invalid JPEG format: missing SOI marker

While the image is easily shown in major browsers

go version
go version go1.12.1 windows/amd64

bradfitz commented 5 years ago

@sheerun's example looks like a progressive JPEG that's truncated. I only say that because it's so blurry and has that JPEG blocky look to it and I assume another pass would sharpen it. I didn't look at its structure.

sheerun commented 5 years ago

yes indeed. I guess it's out of scope of go-lang to nicely handle such case? I think ideal API would return both partially loaded image and flag that tells that image is corrupted

nigeltao commented 5 years ago

I guess it's out of scope of go-lang to nicely handle such case?

Yeah, I'd still say what I said a couple of years ago: https://github.com/golang/go/issues/10447#issuecomment-310585843

yybmsrs commented 4 years ago

go version go1.13.4 windows/amd64 this problem still exists.

tlelson commented 3 years ago

Would it be reasonable to have a few error types in this package so that a user can selectively identify and deal with a specific error such as ErrHuffmanCode ?

whorfin commented 3 years ago

I just thought I'd leave this here, in the hopes of being helpful. In reference to the navidrome use-case I linked above, I made a very simply change to src/image/jpeg/reader.go It's not suitable for a PR, clearly, given the larger conversation here about "the right thing to do". In my case, this works perfectly on my images, returning visually the same image I'd see loading the image (example linked above) in a web browser [gm identify says it has premature EOF, identify does not] The idea is dangerous and simple - don't bail on "short Huffman data", and consider a premature EOF to be the same as EOI - escape the parsing loop and let the outer image return code sort it out. So rather than fully parsing and handling all premature errors specially, the idea would be (which I have clearly not fully done) to carefully and incrementally construct a valid image while decoding, being careful to always leave the image structure in as valid a state as possible. This seems to be the spirit of what is there now. If we run out of image data, return what we have. Simple patch for the "short Huffman data" case ... which I hope helps someone:

--- reader.go-orig  2021-05-17 16:59:44.565429376 +0000
+++ reader.go-new   2021-05-18 11:31:08.323759100 +0000
@@ -12,6 +12,7 @@
    "image/color"
    "image/internal/imageutil"
    "io"
+   "errors"
 )

 // TODO(nigeltao): fix up the doc comment style so that sentences start with
@@ -538,7 +539,12 @@
    for {
        err := d.readFull(d.tmp[:2])
        if err != nil {
-           return nil, err
+           // consider unexpected EOF to be same as EOI, let outer code
+           // find out if we have an image to return
+           if !errors.Is(err, io.ErrUnexpectedEOF) {
+               return nil, err
+           }
+           break
        }
        for d.tmp[0] != 0xff {
            // Strictly speaking, this is a format error. However, libjpeg is
@@ -648,7 +654,10 @@
            }
        }
        if err != nil {
-           return nil, err
+           // don't bail on short Huffman data
+           if !errors.Is(err, FormatError("short Huffman data")) {
+               return nil, err
+           }
        }
    }

metakeule commented 1 year ago

So to recap:

This issue is now 8 years old and have never been fixed.
It is a showstopper for any use of the image library on a server where random images are uploaded.
This library behaves worse than any reasonable image processing software out there.
The issue could have been easily fixed, by returning a proper error message and the processed image up to the point where the unexpected error happened.
The maintainer was not sure about what to do and avoided any decision for 7 years.
In this time nobody from the Go developers stepped up to help our fix the issue by himself.
After the maintainer prevented any solution from happening for 7 years, he now is unassigned from the issue.
Now nobody is assigned to the issue and nobody fixed it.

Great job!

metakeule commented 1 year ago

In case, anybody does care, I have forked now the jpeg package and fixed some of the issues with the help of the code of @whorfin

https://gitlab.com/golang-utils/image2

golang / go

image/jpeg: add options to partially decode or tolerantly decode invalid images? #10447