Open pranavraja opened 9 years ago
It seems to decode fine for me, on tip. I know your Go version is only a few days old, but can you "git sync" and re-try?
Strange, I updated to go version devel +e5b7674 Wed Apr 15 02:28:53 2015 +0000 linux/amd64
, and am still getting the same error. Here's my test program:
package main
import (
"image/jpeg"
"net/http"
"fmt"
)
func main() {
res, err := http.Get("https://streamcoimg-a.akamaihd.net/000/340/810/9ae536dd97d2d92fc17a6590509a51c0.jpg")
if err != nil {
panic(err)
}
defer res.Body.Close()
img, err := jpeg.Decode(res.Body)
if err != nil {
panic(err)
}
fmt.Println(img.Bounds())
}
And here's the output:
panic: invalid JPEG format: short Huffman data
goroutine 1 [running]:
main.main()
/usr/share/fix-images/check.go:17 +0x182
Your program works for me with tip. Are you sure you're using Go tip?
Add a fmt.Println(runtime.Version()) to your program and make sure you're not accidentally using an older version.
Added fmt.Println(runtime.Version())
devel +e5b7674 Wed Apr 15 02:28:53 2015 +0000
panic: invalid JPEG format: short Huffman data
goroutine 1 [running]:
main.main()
/usr/share/fix-images/check.go:19 +0x28a
Anyway, as long as this is fixed on tip I'm happy to close this.
Hello! I try to use golang v1.6.2 and with image of @pranavraja everything okay, but with image http://bubble.ru/system/magazines/mg_n11_01_original.jpg I have the same error. Please, can you explain what is problem?
I'm not sure what the problem is, but it's not a regression: I see the same error on the stable release (Go 1.6). We're in code freeze for the upcoming 1.7 release; I'll take a look at it once the tree opens again for 1.8.
I'm uploaded image with problem to github for safety, because hoster can remove her. https://cloud.githubusercontent.com/assets/1322855/16350692/f67e1266-3a68-11e6-96a1-205b396b1ace.jpg
I met the same problem. It raise "OSError: image file is truncated (53 bytes not processed)" when I use python to download a jpeg file, the binary data can be save to file with ImageFile.LOAD_TRUNCATED_IMAGES=True, but the truncated pixel will set to black. with go, all pixel process well and not truncate but it will raise "invalid JPEG format: short Huffman data" when decode it. the jpeg show full in safari but truncated in chrome. maybe go need truncate the jpeg as python does
I have the same issue with pictures from my phone, I attach one in case it helps https://cloud.githubusercontent.com/assets/6634115/19217480/2c442a36-8ddc-11e6-8392-4b45725b49ef.jpg
$ go version go version go1.7.1 freebsd/amd64
package main
import (
"fmt"
"image/jpeg"
"net/http"
)
func main() {
urls := []string{
"https://streamcoimg-a.akamaihd.net/000/340/810/9ae536dd97d2d92fc17a6590509a51c0.jpg",
"https://cloud.githubusercontent.com/assets/6634115/19217480/2c442a36-8ddc-11e6-8392-4b45725b49ef.jpg",
}
for _, u := range urls {
res, err := http.Get(u)
if err != nil {
panic(err)
}
defer res.Body.Close()
img, err := jpeg.Decode(res.Body)
if err != nil {
panic(err)
}
fmt.Println(img.Bounds())
}
}
I got fail in the second @elektroid mentioned.
(0,0)-(1920,1080)
panic: invalid JPEG format: short Huffman data
goroutine 1 [running]:
panic(0x5f5c00, 0xc0421a4060)
c:/dev/go/src/runtime/panic.go:527 +0x1ae
main.main()
c:/dev/go-sandbox/jpeg.go:22 +0x21d
exit status 2
first.
9ae536dd97d2d92fc17a6590509a51c0.jpg: JPEG image data, Exif standard: [TIFF image data, little-endian, direntries=0], baseline, precision 8, 1920x1080, frames 3
second.
2c442a36-8ddc-11e6-8392-4b45725b49ef.jpg: JPEG image data, Exif standard: [TIFF image data, big-endian, direntries=9, datetime=2016:09:29 20:09:59, GPS-Data, model=Aquaris M5.5, resolutionunit=2, yresolution=155, xresolution=163], baseline, precision 8, 3120x4160, frames 3
The same issue +1, some special jpg will cause this problem:
invalid JPEG format: short Huffman data
@elektroid I'll try to find some time next week to look at it, but FWIW, I get e-mail for every comment on this issue, and somewhere along the mail pipeline, or in my browser's JPEG decoder, that attachment doesn't look like a valid JPEG. I've attached a screenshot from my mail, where I've added a pink ring to emphasize where it breaks down.
@elsonwu can you give more details than "some special jpg will cause this problem"? Can you attach an example? Do other programs (e.g. web browsers, Photoshop) handle those special JPEGs OK or do they also reject them?
@nigeltao, what's the status here?
I switched to "gopkg.in/gographics/imagick.v1/imagick" hoping it would cope with my improper files but it fails to load them too.
Sorry, I didn't find the time to make a detailed investigation, and there have been no recent changes to Go's image/jpeg package, but it sounds like non-Go software is also reporting errors with some or all of these cases.
Yeah, but I can open it in Chrome. I thought we tried to match whatever browsers do.
All of the failing testcases here and others that I've found are truncated. They don't have complete SOS segments, don't contain an EOI, and raise warnings with other decoders. In @elsonwu's case, it's fixed by appending \x00\xff\xd9
. The others are missing more data.
Still, there is a bug in that there's no way to decode truncated images, which seem relatively common and are readable with most other decoders.
My first thought is to return a partially decoded image (if there is one) along with the error; see https://github.com/special/go/commit/c7a05f392ca604f6070009364827b04535293974. I don't mind adding docs/tests and submitting that if the approach is ok.
Otherwise, it seems slightly inappropriate to silence legitimate decoding errors, and there's no API for decoding options, so I'm not sure what else to do.
We could possibly change jpeg.Decode (and the other image codecs) to return (non-nil Image, non-nil error) with partial results if it encounters an error, although that's unusual in general for functions returning (T, err), and certainly not going to happen for Go 1.8.
As for matching whatever browsers do, what browsers do influences how far down the Postel's law slope we slip, but I'm wary of the slippery slope, and according the JPEG spec, these are invalid images.
Just did a quick look at this for my reasons and summarized it like this:
I'm saying the files are not truncated by examining the bottom right block and seeing pixels - as opposed to parking garage which is clearly truncated a couple of lines of blocks early. I didn't check metadata vs lines of pixels however, so whole lines of blocks could be missing.
The ICC_PROFILE may be something that's accepted in JPEG formats, or the format descriptions I was looking at weren't showing the working standard (as opposed to the published standard). It clearly decodes in other decoders in any case.
Anyways, I don't have a solution to your quandary about returning an error and an image, I suppose you could put the decoded image in the error (ack), but my thought is that the issue might better be described as instead of "image/jpeg: Unable to decode valid JPEG image" as "image/jpeg: Unable to decode invalid/truncated JPEG image". Cheers!
I have another example of invalid JPEG image. The problem with the file is that the second half of the file is filled with garbage bytes:
$ xxd bad-image.jpg
00000000: ffd8 ffe0 0010 4a46 4946 0001 0100 0001 ......JFIF......
00000010: 0001 0000 ffdb 0043 0005 0304 0404 0305 .......C........
... valid JPEG contents ...
0001fff0: 7ca0 574a 75cf 835d 4b12 cffd 9a0e 8fa1 |.WJu..]K.......
00020000: 7e33 1885 9110 0a4f b753 fff7 9de2 be06 ~3.....O.S......
... same 16-byte pattern ...
0003f4e0: 7e33 1885 9110 0a4f b753 fff7 9de2 be06 ~3.....O.S......
0003f4f0: 7e33 1885 9110 0a4f b753 fff7 9de2 be06 ~3.....O.S......
image.Decode
fails with invalid JPEG format: missing 0xff00 sequence
, but the browsers display the image:
In regards to supporting invalid JPEG images. I would vote for adding this support sooner by keeping the semantics of jpeg.Decode
unmodified and adding a new function to decode potentially invalid JPEGS, e.g. jpeg.TryToDecode
. Alternatively, if it is not desirable to add such function to jpeg
package, a new experimental package could be added. The package would implement the new semantics of JPEG decoding.
This way people can start using it today in these rare cases when image.Decode
fails but it is known that the byte stream is a JPEG file.
By the way, if such package already exists please let me know.
I'd rather have a different package instead of adding TryToDecode to the standard library for the rest of Go 1.x's lifetime. As a bonus, such a package wouldn't be bound by the standard library feature freeze that we're currently in.
I don't know if any such package already exists, and I won't have time to make one any time soon. Sorry.
Related: #20804 for an invalid GIF that browsers decode, but Go doesn't.
@bradfitz can we change the title of this issue? This is now about doing something for invalid images, not valid ones.
@rasky, done.
Here's another example:
still running into this issue in go version go1.11.4 darwin/amd64
(VCG Wilson / Corbis via Getty Images)
Attempt to decode leads to:
invalid JPEG format: missing SOI marker
While the image is easily shown in major browsers
go version
go version go1.12.1 windows/amd64
@sheerun's example looks like a progressive JPEG that's truncated. I only say that because it's so blurry and has that JPEG blocky look to it and I assume another pass would sharpen it. I didn't look at its structure.
yes indeed. I guess it's out of scope of go-lang to nicely handle such case? I think ideal API would return both partially loaded image and flag that tells that image is corrupted
I guess it's out of scope of go-lang to nicely handle such case?
Yeah, I'd still say what I said a couple of years ago: https://github.com/golang/go/issues/10447#issuecomment-310585843
go version go1.13.4 windows/amd64 this problem still exists.
Would it be reasonable to have a few error types in this package so that a user can selectively identify and deal with a specific error such as ErrHuffmanCode
?
I just thought I'd leave this here, in the hopes of being helpful.
In reference to the navidrome use-case I linked above, I made a very simply change to src/image/jpeg/reader.go
It's not suitable for a PR, clearly, given the larger conversation here about "the right thing to do". In my case, this works perfectly on my images, returning visually the same image I'd see loading the image (example linked above) in a web browser [gm identify
says it has premature EOF, identify
does not]
The idea is dangerous and simple - don't bail on "short Huffman data", and consider a premature EOF to be the same as EOI - escape the parsing loop and let the outer image return code sort it out.
So rather than fully parsing and handling all premature errors specially, the idea would be (which I have clearly not fully done) to carefully and incrementally construct a valid image while decoding, being careful to always leave the image structure in as valid a state as possible. This seems to be the spirit of what is there now.
If we run out of image data, return what we have.
Simple patch for the "short Huffman data" case ... which I hope helps someone:
--- reader.go-orig 2021-05-17 16:59:44.565429376 +0000
+++ reader.go-new 2021-05-18 11:31:08.323759100 +0000
@@ -12,6 +12,7 @@
"image/color"
"image/internal/imageutil"
"io"
+ "errors"
)
// TODO(nigeltao): fix up the doc comment style so that sentences start with
@@ -538,7 +539,12 @@
for {
err := d.readFull(d.tmp[:2])
if err != nil {
- return nil, err
+ // consider unexpected EOF to be same as EOI, let outer code
+ // find out if we have an image to return
+ if !errors.Is(err, io.ErrUnexpectedEOF) {
+ return nil, err
+ }
+ break
}
for d.tmp[0] != 0xff {
// Strictly speaking, this is a format error. However, libjpeg is
@@ -648,7 +654,10 @@
}
}
if err != nil {
- return nil, err
+ // don't bail on short Huffman data
+ if !errors.Is(err, FormatError("short Huffman data")) {
+ return nil, err
+ }
}
}
So to recap:
Great job!
In case, anybody does care, I have forked now the jpeg package and fixed some of the issues with the help of the code of @whorfin
go version devel +ce43e1f Mon Apr 13 23:27:35 2015 +0000 linux/amd64
Attempted to use
jpeg.Decode
on the below image: https://streamcoimg-a.akamaihd.net/000/340/810/9ae536dd97d2d92fc17a6590509a51c0.jpgExpected the image to decode successfully, as it displays in a browser.
Actual result:
invalid JPEG format: short Huffman data