iand commented 6 years ago

Motivation

The current Decode function in the image package provides no way to configure the underlying format decoder. The registration system used by image makes it difficult to extend the decoding behaviour of the registered format handlers. However there are many open issues that could be resolved with a small number of decoding options. This proposal describes a way of extending the existing format registration system to enable options to be passed to the individual format decoders. It introduces a DecodeOptions type to serve as an extension point.

There are two broad areas of configuration that are considered. This proposal tackles them both but they are orthogonal and may be independently evaluated and implemented. The areas are:

Avoiding large allocations on unprocessable images

This class of problem is generally caused by faulty or malicious header information in the image file. Typically this could be a very large x or y dimension that causes a huge buffer to be allocated in preparation for reading a stream of pixel data. This is currently possible to mitigate by first decoding the image config to check the dimensions before proceeding to a full decode. However this has the disadvantage of either reading the input twice or buffering and re-reading to decode. Discussion in the relevant issues suggested that a 16k buffer would be suitable. The following issues refer to this use case (with #5050 being the overarching issue)

5050 - image/gif: decoding untrusted (very large) images can cause huge memory allocations
10790 - x/image/webp: excessive memory consumption
12512 - image/png: limit memory Decode can use while decoding
10399 - x/image/bmp: out of memory

Being more tolerant of invalid input

The image decoders in the standard library are strict and fail on invalid input. There are classes of invalid input that may be acceptable for some uses. The following issues suggest that a lenient decoding mode would be helpful:

10447 - image/jpeg: add options to partially decode or tolerantly decode invalid images?
20804 - image/gif: decoding gif returns unknown block type: 0x01 error
20899 - image/png: Decode failing on bitmap
20856 - image/gif: decoding gif returns frame bounds larger than image bounds error

Other options related issues

In addition, the following issues suggest other areas that could benefit from decoding options but are not considered further in this proposal:

8055 - image: decode / resize into an existing buffer
18098 - proposal: add Validate functions to image/jpeg, image/png etc.
4341 - image/jpeg: correct for EXIF orientation?
18365 - image/png: no support for setting and retrieving the PPI/DPI

Package Changes

The primary extensibility point is a new type. Its fields are discussed at the end of this section.

type DecodeOptions struct

Format decoders are registered via a new package level function RegisterFormatDecoder

func RegisterFormatDecoder(f FormatDecoder)

The FormatDecoder interface needs to be implemented by any package providing image format decoding:

type FormatDecoder interface {
    // Name is the name of the format, like "jpeg" or "png".
    Name() string

    // Prefix is the magic prefix that identifies the format's encoding. The magic
    // string can contain "?" wildcards that each match any one byte.
    Prefix() string

    // Decode decodes an image using the supplied options.
    Decode(r io.Reader, o *DecodeOptions) (Image, string, error)

    // DecodeConfig decodes the color model and dimensions of an image using the supplied options.
    DecodeConfig(r io.Reader, o *DecodeOptions) (Config, string, error)
}

A new exported Decoder type is introduced:

type Decoder struct {
    DecodeOptions
}

with a basic constructor function.

func NewDecoder() *Decoder

The Decoder type has the following method set:

// Decode decodes an image.
Decode(r io.Reader) (Image, string, error)

// DecodeConfig decodes the color model and dimensions of an image.
DecodeConfig(r io.Reader) (Config, string, error)

These Decode methods will sniff the type of the image from the reader and look up an appropriate registered FormatDecoder. If one is available then its corresponding Decode method is called, passing the Reader and the Decoder's options. This requires the Decode to have access to package level registered format decoders.

To configure decoding the developer will create a new Decoder and then set the appropriate fields before calling Decode or DecodeConfig.

The following options are proposed.

MaxHeight and MaxWidth

MaxHeight and MaxWidth are DecodeOptions fields that set the maximum allowable dimensions of a decoded image. These fields restrict the allocation of large amounts of memory for faulty or malicious images.

MaxHeight int
MaxWidth int

These options are only used by the Decode method. When a Decoder attempts to decode an image with dimensions exceeding either MaxHeight or MaxWidth then it should stop processing and return an error. A new error in the image package ErrDimensionsExceeded could be defined as standard or it could be left to the decoder to return its own error.

DecodeConfig should return a Config containing the width and height described in the image data stream regardless of the config options.

ReturnPartial

ReturnPartial is a DecodeOptions field that instructs the decoder that it may return a partially decoded image when an error is encountered.

ReturnPartial bool

The current Decode behaviour is that no image data is returned on error. When this option is true the caller may receive a partial image from the decoding process when the decoder encounters a format related error during the decoding process.

Backwards compatibility with existing registered formats

Existing callers of RegisterFormat will be supported by adapting the existing format type and passing it to RegisterFormatDecoder, along these lines:

func (f *format) Name() {
    return f.name
}

func (f *format) Prefix() {
    return f.magic
}

func (f *format) Decode(r io.Reader, o *Options) (Image, string, error) {
    m, err := f.decode(rr)
    return m, f.name, err
}

func (f *format) DecodeConfig(r io.Reader, o *Options) (Config, string, error) {
    m, err := f.decodeConfig(rr)
    return m, f.name, err
}

The existing package level Decode and DecodeConfig functions will be rewritten to create a decoder and call it without any options:

func Decode(r io.Reader) (Image, string, error) {
    return NewDecoder().Decode(r)
}

func DecodeConfig(r io.Reader) (Config, string, error) {
    return NewDecoder().DecodeConfig(r)
}

Deprecation of RegisterFormat

RegisterFormat would be marked as deprecated in favour of RegisterFormatDecoder.

rsc commented 5 years ago

/cc @nigeltao

nigeltao commented 5 years ago

The overall goal seems OK. Some rambling thoughts, major and minor:

I'd suggest forking image/* to start with, and proving the concept, before we make API changes to the stdlib that we have to support forever.

For example, the API proposal still doesn't let us see the intermediate stages when decoding a progressive JPEG, when the compressed bytes come in slowly. There is a new ReturnPartial option proposed, but IIUC that only triggers when we see an error, and doesn't trigger when we're merely blocked on an io.Reader.

We don't have to solve the progressive JPEG case now, but I'm hesitant to e.g. commit to an API in Go 1.12 that makes it harder to solve that case in Go 1.16.

One possible design approach for that is to use something that's more like a text.Transformer instead of an io.Reader.

That's the approach I'm taking in my Wuffs project. It's a little hard to see, but that's what the Wuffs io_buffer_meta type is. There is no blocking I/O, the coroutine yields to the caller with a partially decoded image.

That Transformer-like approach, which allows for un-reading bytes, might also lead to significant speed-ups. There are optimization techniques that e.g. C's libjpeg does that Go's image/jpeg can't do because the C code can optimistically read too many bytes, and later un-read those it doesn't need. In comparison, Go's io.Reader interface and Decode(r io.Reader) (image.Image, string, error) function type doesn't have anywhere to say "sorry, I read 3 bytes too many from r, here's the change".

But using a Transformer-like approach, instead of io.Reader, is a major change, and also more complicated than Go's idiomatic "use an io.Reader and block on I/O, because goroutines are cheap". Experimenting with an approach like that should really start outside of the stdlib, but then also could arguably stay outside the stdlib, as an x/image2 or std2/image or iand/image or whatever. Breaking out of backwards compat constraints could also mean being able to change the default byte order from RGBA to BGRA, or otherwise be more BGRA-friendly, which would make it easier to integrate with Windows and X11 GUI's native formats without e.g. moderately expensive per-frame swizzling.

A final, minor point. I don't think we need a NewDecoder function, if clients can just use a Decoder{} struct literal.

iand commented 5 years ago

@nigeltao Thanks for the comments. I will do some more research on the reader point. It seems @dsnet's comment on #23154 is relevant, particularly the BufferedReader type he mentions.