Metadata (EXIF, color profiles, etc etc)

FLIF-hub commented 9 years ago

I have the following idea on how to add support for metadata: A FLIF file is either a "bare" image file, or it is actually a .tar (or some other archive format) which contains files with standardized names, e.g. _image.flif, _meta.gz, _colorprofile.gz, browser_hints.gz. This would make it easy to add arbitrary information (just add a custom file to the archive) and manipulate or remove the metadata.

So basically the .flif file would only contain the information needed to decode the pixels, and anything else (including information on how to interpret the pixel values, like gamma etc) would be in those optional extra files.

Would that be a good idea?

heavyk commented 9 years ago

I like the idea of a tar file. that would mean additional metadata (or other data) could be included with the image.

for example, svg data could be included and then the image would become interactive (think maps). I also envision that if javascript could be included as well, then potentially a progressively enhanced image could even be a game level. (especially if tiling were to be added)

OkGoDoIt commented 9 years ago

As long as it doesn't affect progressive rendering. I suppose you would have to ensure that the actual .flif data is saved at the tail end of the .tar file.

heavyk commented 9 years ago

true, it'd have to be at the tail, though once you have the tar or whatever header, depending on the connection, you should be able to skip the metadata and get right to the image (if it's like really big or something).

but yeah, I've been daydreaming all day about having this format in the future. it's totally the most awesome thing, ever.

FLIF-hub commented 9 years ago

I suppose some of the metadata (anything that might be relevant for viewers, like image title/description, gamma correction etc) should be in the beginning, while some of it would better be at the end (because it would only slow down progressive rendering).

FLIF-hub commented 9 years ago

Maybe a simple .ar would be better (like .deb packages?)

FLIF-hub commented 9 years ago

Another option would be to reuse PNG as a container format, using new values for compression method, filter method and interlace method.

Or we could make it look like a PNG file that contains a thumbnail preview (so software that does not supports FLIF just sees a very low res image) and all the metadata like gAMA and tIME, with the actual FLIF file contained in a new kind of chunk, which could be called "FLiF" or something like that.

heavyk commented 9 years ago

the thumbnail idea is cool, but it kinda defeats the purpose of progressive download. it should be optional though, and will be especially useful while adoption rate is still low...

DagAgren commented 9 years ago

.tar is not really one file format, it's an incredible mess of incompatible extensions of an ancient file format, designed for a completely different kind of computing and storage architecture than exists today.

Never, ever use .tar for anything but legacy support. Absolutely avoid it for anything new.

You can easily make up a simple file format yourself that will be better and easier to read than .tar.

DagAgren commented 9 years ago

Reusing PNG seems a much better idea. It lets people reuse a lot of existing code, and gives you lots of metadata support essentially for free.

lmarcetic commented 9 years ago

If it's doable, making this the new PNG algorithm might open the doors to standardization and therefore adoption. PNG is a W3C, IETF, and ISO/IEC standard.

karlcz commented 9 years ago

While considering container layout, please consider whether important sections are indexed for random access or require stream-parsing the whole file to find the sections, whether you can modify metadata in place without having to relocate other sections, and whether you can create a container in one pass without having to buffer the whole file. These issues all become significant with large and/or multidimensional images.

jonsneyers commented 9 years ago

boutell commented (https://github.com/FLIF-hub/FLIF/issues/14#issuecomment-145550351) :

.tar itself is archaic and probably a can of worms you don't want to open.
I understand the impulse to go with a pre-existing container format... but
perhaps PNG's chunk format is that format. Look at how PNG's chunk naming
conventions ward off the casual introduction of incompatible stuff.

By reusing a more general-purpose container format you might inadvertently invite
"kitchen sink syndrome" and wind up with TIFF (:

I agree that .tar is probably a bad idea indeed (at the very least something aimed at random-access instead of sequential access should be used) , and also I think the "kitchen sink syndrome" is indeed a valid concern. However, the flexibility and simplicity of a general-purpose container is still quite appealing to me. We should just have some conventions on how to represent the standard metadata (e.g. exif metadata as a simple gzipped text file or something like that), and have some naming rules to keep the public and private stuff separated.

jonsneyers commented 9 years ago

OK, unless anyone objects, I'm going to go ahead and decide that we'll use .ar with some pre-defined filenames that will have standardized meanings, and if anyone wants to put arbitrary sidecar files in there, it will be easy enough to ignore/strip them.

The predefined filenames will start with an underscore (or two?). For now, I would propose the following predefined filenames, all of which are optional (and will be unimplemented/ignored for now) except the first one:

filename	description
_image.flif	the main image data, the only thing the decoder really needs
_flif_partial_decodes	offsets and CRCs of partial decodes, useful for a browser to decide how much to fetch
_meta.xmp.gz	XMP metadata
_meta.exif.gz	Exif metadata
_meta_color_profile.icc	ICC color profile
_meta_comment.txt.gz	arbitrary text comments
_flif_tile_structure	very large image, split in several tiles for efficient crop decoding. This describes how the tiles are structured (hierarchical? offsets? etc)
_tile%1d_r%4d_c%4d.flif	filenames of the individual tiles, e.g. _tile_1_r0002_c0004.flif would be a tile in the first level of the hierarchy at row 2 and column 4
_source_uri	original location of this image (warning: privacy danger!), could be useful if browsers download a partial file and want to fetch more detail later

hrj commented 9 years ago

For some of these to be effective, there would have to be an ordering defined as well. For example, _flif_partial_decodes needs to be fore the .flif.

lucaswerkmeister commented 9 years ago

I agree that .tar is probably a bad idea indeed (at the very least something aimed at random-access instead of sequential access should be used)

As far as I can tell from the Wikipedia article, .ar doesn’t seem to support random access very well either – if each data section is right after each header, then to read the _n_th file, you’d need to read n-1 previous headers and skip over n-1 data sections, right?

jonsneyers commented 9 years ago

Yes. If for some reason n would have to become large, then it would make sense to embed some lookup table as the first file (I think some versions of ar do that). In my mind, n should normally be a small number :) The ordering is indeed important; for _flif_partial_decodes to be effective, it has to be in the beginning, and I guess the same holds for rendering guidelines like color profiles or gamma correction. I'm hesitant to dictate an ordering though, because for some kinds of information (e.g. XMP/Exif metadata), it might depend on the use case which ordering is best: if you want an image to decode progressively as fast as possible, it's best to start the actual image data as early as possible and not waste any bandwidth on stuff that gets ignored, but if you want to quickly view the metadata fields then it's the other way around.

TimNN commented 9 years ago

If we do go the .ar route, I think we should consider using two distinct file extensions:

One for the actual image data (everything that is currently present in a .flif file)
and one for the .ar container

For example .rlif (raw lossless image format) and .flif or .flif and .oic (open image container)

psykauze commented 9 years ago

Many image container use a hack of TIFF like NEFs or directly an extension like DNGs. Maybe you could use the TIFF container (by defining a new id for image format) and just focus to the image coder ?

psykauze commented 9 years ago

There's also a proposal for a Random-Access feature on tiff you might use http://www.triton.vg/IndexedTIFF.html

jonsneyers commented 8 years ago

OK, so we are using ar (optionally) to store all metadata. As far as the M1 release is concerned, that's all we need to know. Providing the functionality to read and write metadata is something that can be done later -- from the point of view of the bitstream specification, we can consider metadata to be arbitrary black-box data (at least for now). So I'm moving this issue from M1 to M2.

Kroc commented 8 years ago

Wouldn't you want some metadata to be (optionally) pushed to the header for purposes of partial decoding? E.g. You might want to know image rotation and colour profile data before the image data so that you render it consistently throughout the deinterlacing.

Also, isn't the need to store 'filenames' a waste of bytes? Could you not use PNGs style of four-letter words for the box contents (if you took them wholesale, PNG data could be transferred losslessly into FLIF; FLIF itself could define new groups for its own purposes).

Really excited about this format, keep up the great work!

MarcusJohnson91 commented 8 years ago

I don't understand what you gain by using a tar header and all that associated nonsense, when you could just an existing metadata format like EXIF, or create your own? then it'd be built right into the file, and you wouldn't need to add yet another parser before you got the actual data out?

jonsneyers commented 8 years ago

The idea is to use existing metadata formats like EXIF and ICC color profiles, and just embed them as files in an .ar file. There is a 60 byte per file overhead to that (which is more than the 12 byte overhead for PNG chunks), but the advantage is that standard tools can already work with .ar files. Some of that "overhead" could be useful in some cases (like the timestamp and perhaps the permissions/owner) or perhaps some of the fields can be "abused" to store some information (e.g. the 16 bytes for the filename could be used to store the wrapping (name, length, CRC) of a PNG chunk).

You can choose whatever order you want for the files in the .ar file, so it would indeed make sense to put the EXIF (for Orientation) and ICC color profile first, before the actual image data.

haasn commented 8 years ago

Sort of seems needlessly wasteful to be spending all these bytes on filenames, owner/group IDs, permissions, checksums and whatnot - not to mention all of the weird incompatibility issues it might cause. It also bothers me to be encoding the length information etc. in decimal rather than just directly, since it makes parsing a flif file harder than it needs to be and adds an unnecessary source of potential exploits (will all browsers handle a file size of “12x45” well? What about all content management systems? All image hosts?)

In addition, I think there should be only one way to encode a FLIF (given the compressed image and metadata). This means no hidden/ignored bits, no arbitrary ordering, no arbitrary filenames, no arbitrary encodings, etc. Failure to adhere to this rule I think sets us up for very nasty incompatibility issues, security vulnerabilities, unexpected/weird behavior, etc.

If you just go ahead and attach additional standards to the .ar format (like “access time is used to store a checksum, uid/gid must be 0:0, ...”) you might as well roll your own format since the “standard tools” are not going to cut it one way or another. (Plus, making a simple chunk format like PNG is dead simple)

P.s. I also question the importance of checksums in an image format. Doesn't checksumming already happen on many layers below the image? (Ethernet, TCP, archives, filesystems, drive controllers, etc.) While I don't mind wasting a few bytes on checksums, as a developer I personally enjoy the ability to easily modify image files using a hex editor and have them remain valid.

Personally, I would like to base an image format on PNG's chunk stream since it works the best in practice, although PNG has some annoying caveats and design quirks that I would try to revisit. (Like the whole gAMA/cHRM/iCCP/sRGB weirdness/redundancy)

jonsneyers commented 8 years ago

I agree about not needing checksums.

We could easily roll our own chunk format based on PNG's. Obviously the 12-byte per chunk overhead of PNG is better than the 60-byte per file overhead of .ar (and if we throw away the 4-byte CRC, it's even better).

I just want to avoid having to define all possible kinds of metadata as part of the "FLIF format". So maybe just define a few chunks explicitly, plus one generic "arbitrary stuff" chunk, and that's it.

What about this layout:

4-byte chunk name 4-byte chunk length [chunk contents] 4-byte chunk name 4-byte chunk length [chunk contents] 4-byte chunk name 4-byte chunk length [chunk contents] FLIF <-- has to be last chunk, no length needed [chunk contents]

All metadata needs to go in front in this layout (though you could circumvent that by defining a chunk that says "the main FLIF chunk ends at byte XXXX, there are more chunks there".

The advantage would be that if there is no metadata at all, then it's just the same as the current format.

Chunk names could be: FLIF actual image data ICCP color profile EXIF exif metadata TEXT arbitrary human-readable data (comments, copyright info, etc) DATA arbitrary machine-readable data FLIM obligatory first chunk of a FLIF file with Metadata, only needed to make file identification easy

haasn commented 8 years ago

A few thoughts:

I'm very happy with the chunk format you proposed. Human-readable chunk identifiers are convenient for a number of reasons, ranging from source code clarity to ease of identifaction via hex editor.
4-byte length field seems plenty (especially since it's only for metadata), and I wouldn't go below it since 2-byte length is a real concern for embedded ICC profiles.
It's a very good idea to require the metadata to be the first chunk, and the compressed contents to be the last chunk, I would have proposed the same.
A “standard” sRGB ICC profile is about 1 kB in size. To optimize for the very common case, it would probably be a good idea to explicitly standardize that an images with no ICCP chunk are to be considered sRGB, since it covers 99.999% of use cases out there. With this in mind, I don't think there's any use for PNG-style sRGB or gAMA/cHRM tags (especially since ICCv4 profiles more than adequately cover that kind of use case)
TEXT and DATA both feel sort of arbitrary. Treating TEXT as purely a human-readable comment is okay, but machine-readable “arbitrary data” chunks very quickly set you up for confusion and incompatibility as different tools try using it for different purposes.

If your aim is to future-proof the file format against whatever needs companies will have in the future, then I think following the PNG chunk namespacing model would be a good idea:

http://www.w3.org/TR/PNG/#5Chunk-naming-conventions

Essentially, the chunks are identified using ASCII letters, and the letters being uppercase or lower case determines their namespace. An upper case first letter means that the chunk is required, for example, and lower case means the chunk is optional and can safely be ignored by applications that don't support it (or removed by tools).

In this system, a lower case second letter indicates that the extension is “private”. So for example, an application like krita could store its own metadata (if it really wants to) using a krTA chunk, which is guaranteed to never conflict with any official chunks or be misrepresented as such.

(And finally, the fourth letter being lower case indicates that a chunk is safe to copy blindly if editing an image, although I personally don't see that much use in this)

Given a simple scheme like this, I don't think that DATA is necessary, since vendors can just pick their own private chunk to store things in. For comparison, PNG currently defines the following chunks: (in roughly the order that they have to appear in, and with my comments appended)

IHDR - header (required)
iCCP/cHRM/gAMA/etc. - basically all colorspace stuff (optional) -> only ICC profiles really necessary
PLTE - palette, only required if palette mode is used -> palettes should be part of FLIF if even supported
pHYS - physical dimensions (optional)
tRNS - transparency color (only used in palette mode) -> should be part of FLIF if even supported
hIST - image histogram -> rarely needed and cheap to compute in 2016
sPLT - suggested palette -> cheap to compute
tIME - image modification time -> filesystems already have access times
tEXT/iTXT/zTXT - arbitrary comment text (and internationalized + compressed versions) -> self-redundant, compression really not worth it for text comments in 2016
IDAT - image data (required)
IEND - image end marker (required) -> only used because PNG can have multiple * IDATs, doesn't apply to us

Of these, some are obviously not very relevant anymore or inapplicable for FLIF. (Does FLIF support palette-mode images? And if so, is it part of the FLIF chunk or not? I think it should be), and the ones that I think remain very relevant are:

FLIM (for flif image metadata)
FLIF (for flif image data, and probably also stuff like palettes or whatnot)
iCCP for optional ICC profiles, lack of which indicates sRGB
eXIF for optional EXIF metadata
tEXT for optional arbitrary comments
pHYS for physical dimension info: image DPI, aspect ratio, whatever. (This is especially important with the wake of high DPI monitors and stuff, where knowing an image's native DPI would be potentially very useful)

Everything else I think can be trimmed. To maintain a single bytestream, the way I would standardize it is to require chunks be sorted descending in lexicographic numeric order, wrapped by FLIM and FLIF at the beginning and end. This would give rise to the following chunk order:

FLIM header
All required and standard chunks: currently none apart from FLIM/FLIF
All “required” and nonstandard chunks: not sure if these should exist, but they may be useful for internal purposes
All optional and standard chunks: tEXT, pHYS, iCCP, eXIF (in that order)
All optional and nonstandard chunks: (up to the user)
FLIF image data

This encoding has a lot of good properties:

Unique order/encoding
Very easy for websites and tools to strip out “application” chunks in a forward compatible way (strip everything where the second letter is lower case)
Very easy for FLIF minifiers to strip out “optional” chunks in a forward compatible way (strip everything where the first letter is lower case)
Give us room to add more chunks in future revisions
Give users room to add application-specific metadata in a way that will never conflict
Simple to construct, simple to work with: In fact, you can re-use header chunks in stuff like scripts or via cat.
(Minor) The small and useful metadata chunks like FLIM, tEXT, pHYS etc. end up near the beginning, and the bulkier chunks like iCCP, eXIF and FLIF end up near the end. This is a tiny bit more comfortable for those analyzing image files in hex editors (which does happen)
It's basically entirely compatible with PNG naming, so a vendor could pick the exact same chunk name for both FLIF and PNG, thus easing the transition.

And a few bad ones:

Malicious users could “invent” chunks which seem to indicate being mandatory and standardize but in reality aren't, and therefore get by some sort of FLIF “stripper” in an image host. But I think that websites will probably be best suited by explicitly whitelisting a set of accepted chunks (and also checking them to make sure they contain the data they advertise) either way.
Users and tools could break the “lexicographically sorted” rule either intentionally or accidentally, which violates the “unique order” goal. For this it might be a good idea to have the reference implementation either warn (like libpng) or explicitly ignore “unexpected” chunks.

Either way, this system has worked very well for PNG, and it will certainly work well enough for FLIF.

RFC

haasn commented 8 years ago

Another thing worth noting:

In PNG, the ICC profile is actually DEFLATE-compressed inside iCCP. I'm not sure if it's worth doing that for our ICC profiles as well. DEFLATE is mostly picked in PNG because any PNG decoder is already expected to implement a deflate algorithm to parse the actual IDAT itself (i.e. they're just reusing the same compression algorithm)

That said, I'm not sure if common ICC profiles compress all that well. I could try investigating it if you think it would be a good idea - does FLIF use some kind of algorithm like DEFLATE internally that could be “reused” here?

(It's worth nothing that JPEG does not compress its ICC profiles iirc. Compressing them also makes things harder for applications which simply want to read or write the ICC profile without actually caring about the image contents)

karlcz commented 8 years ago

An application shouldn't need to make many small reads to walk a chain of metadata segments to find the overall length of metadata nor to find a particular metadata segment. I think that is a flaw in many archive formats which simply append chains of segment header + segment body. I think you need a compact table of contents which can be read contiguously and used to plan any further sparse access to the file based on absolute file positions listed or computed from the table of contents.

I'm not sure I understand how you expect to allow extensible metadata to complement other features:

Using 4-byte file offsets and/or lengths would seem to introduce an unfortunate limit.
Is that FLIF segment an atomic unit for decoding or does it have internal offsets you would want to locate to decode portions of the image space?
How will multi-dimensional science data be handled? A deeply structured FLIF segment or a container format that allows rich metadata and multiple FLIF segments for elements of the multi-dimensional image space?
You should consider metadata encoding support such as zlib compression, whether per segment or for an entire contiguous metadata block.
You should consider padding to allow for in-place editing of metadata segments without having to relocate everything else in the file every time.

haasn commented 8 years ago

An application shouldn't need to make many small reads to walk a chain of metadata segments to find the overall length of metadata nor to find a particular metadata segment.

A valid point, although I'm not sure how relevant this concern is for FLIF - we're building an image format, not a general purpose archive format. I wouldn't expect metadata to be very large, and the entire header should almost always easily fit into L1 cache either way. A few indirections in here are almost surely not worth losing sleep over.

If this does turn out to be a huge concern in practice, the way I would solve it is by adding an integer to FLIM indicating the offset of FLIF inside the file. The reason it makes me queasy is because now a tool that strips chunks will also have to make sure to update the FLIM header lest some tools mysteriously break while others seem to work fine (the former being the ones that rely on that field and the latter the ones which don't..). But I would still strongly advise against it from a security standpoint alone.

How will multi-dimensional science data be handled? A deeply structured FLIF segment or a container format that allows rich metadata and multiple FLIF segments for elements of the multi-dimensional image space?

I personally think this is somewhat out of scope for an image format. If you need some sort of complex multi-dimensional embedding for science data, I would try to approach that with a dedicated special-purpose format that might use flif (or png) images internally. The more simple an image format is, the more likely it is that people will have no objections to adopting it. I can imagine a browser vendor feeling queasy about pulling in support for an image format that requires some sort of complicated support for arbitrary numbers of dimensions and whatnot, whereas supporting something that's essentially PNG in disguise would almost certainly be noncontroversial. (and given PNG and JPEG's massive popularity due to their use on the web, this should definitely be a goal if we want the image format to ever be used in practice)

You should consider metadata encoding support such as zlib compression, whether per segment or for an entire contiguous metadata block.

I think this should be done on a per-chunk basis, never on a entire metadata basis. Most chunks are very small, usually just providing a few integers. For larger stuff like iCCP it would be worth compressing just that chunk, and only if it's necessary. But I wouldn't try to do some sort of cross-chunk compression at all, especially since it gives rise to the classic .tar.gz seeking problem that you were also concerned about.

You should consider padding to allow for in-place editing of metadata segments without having to relocate everything else in the file every time.

If anything, this should be done on a per-chunk basis too, since most chunks are almost surely going to be either small or fixed-size. For example, an iCCP chunk's contents could start off with a integer specifying the length of the actual ICC profile. From the point of view of the FLIF container (and tools that don't care about parsing ICC profiles) this would be completely irrelevant.

(For example, the byte header 69 43 43 50 00 00 04 00 00 00 02 00 DATA would indicate an ICC profile that's 512 bytes long, followed by 512 bytes of padding - something just skipping past the headers would only parse the 1024 and go right past it)

It might be a good idea to add this explicitly to e.g. eXIF and iCCP - but I wouldn't make it a part of the skeleton itself. (For tEXT I would just treat 0 bytes as terminating the string prematurely)

jonsneyers commented 8 years ago

Some comments:

I don't really understand pHYS and the whole concept of a "physical size" or "native DPI" of an image. Say my camera produces a 10 megapixel image and encodes it as a FLIF. What is it supposed to set the "physical size" to?
FLIF does support Palette, but this is indeed part of the FLIF chunk itself. Basically everything needed to decode the pixel data should be in the FLIF chunk.
I agree that sRGB should be defined as the default color profile if no explicit profile is given, at least for 8-bit per channel images. For higher bit depths, I think perhaps Rec. 2020 should be defined as being the implicit default.
FLIF does not use any general-purpose compression algorithm like zlib internally, but of course zlib/DEFLATE is widely available so I don't mind (optionally) compressing metadata with it; applications that deal with metadata should have no trouble gunzipping the chunk. It probably makes more sense for Exif metadata than for ICC color profiles to compress them. The FLIF decoder itself doesn't care about the actual contents of metadata.
Table of contents can be added as an optional chunk should the need arise, but I wouldn't make it obligatory since the typical use case is "small metadata + big image data", where all of the metadata fits in one typical 'block size' for reading or transferring, so the effective difference between "many small reads to walk through the chain" and "getting the ToC and jumping straight to the interesting part" is probably zero or close to zero.

Using 4-byte file offsets and/or lengths would seem to introduce an unfortunate limit.

A limit of 2^32 bytes per metadata item is not something I consider to be a serious limitation.

Is that FLIF segment an atomic unit for decoding or does it have internal offsets you would want to locate to decode portions of the image space?

Atomic (there is no way to decode a random crop without encoding the entire thing), though if it's an interlaced image, then it could be useful to know the offsets of the zoomlevels, in case you want to progressively load the image, perhaps not to full resolution. However, I don't think it helps much to put this information in the file itself, e.g. if this is going to be used to let a browser do specific Range requests to get a truncated file depending on e.g. the device DPI, then it should ideally know the offset before doing the request. So it should be in the HTML or truncated server-side based on HTTP Client Hints or something.

How will multi-dimensional science data be handled? A deeply structured FLIF segment or a container format that allows rich metadata and multiple FLIF segments for elements of the multi-dimensional image space?

FLIF can only encode 2D data (or 3D data if you consider animations) of up to 4-tuples of 16-bit integers, just like PNG basically. If you need more precision/dimensions/whatever, then you'll probably need some kind of container format that splits things up in some way. I suppose you could use nonstandard chunks for that, which define multiple FLIF chunks and one 'main' FLIF chunk that is the only one visible to tools that don't know about the nonstandard chunks.

You should consider metadata encoding support such as zlib compression, whether per segment or for an entire contiguous metadata block.

I consider chunks as stuff that can be just written as-is to a file and applications can do whatever they want with those 'files'. I don't really care if the file is a .gz or .bz2 or whatever. I don't want compression across chunks because that complicates things too much. If you really need multiple items compressed together, you can of course define a chunk that contains something like a .tar.gz

You should consider padding to allow for in-place editing of metadata segments without having to relocate everything else in the file every time.

Can be handled in the chunk contents itself, as discussed above.

So let's start with these chunks:

FLIM (for flif image metadata), obligatory first chunk (unless we start immediately with FLIF), empty chunk so no length required? or will we at some point want non-empty FLIM chunks that contain some kind of external header data?
eXIF for optional EXIF metadata (do we need XMP too?)
tEXT for optional arbitrary comments (is this actually useful? Is this used for other things than messages like "Created with GIMP"?)
iCCP for optional ICC profiles, lack of which indicates sRGB (if 8-bit or less) / Rec. 2020 (if 9-bit or more)
FLIF (for flif image data), only really obligatory chunk, no length required

I would suggest the above order. Color profile is only needed if you're going to actually render the image (right?), so let's have it just before the actual data.

haasn commented 8 years ago

I don't really understand pHYS and the whole concept of a "physical size" or "native DPI" of an image. Say my camera produces a 10 megapixel image and encodes it as a FLIF. What is it supposed to set the "physical size" to?

I agree that it's useless for photos; it's more useful for stuff like screenshots, infographics and other rendered text content I think - although an end user is just going to use browser zoom controls to make the image comfortable to read one way or the other. Probably best not to try and be too smart here.

I didn't feel very strongly about it, it just might be a good consideration to keep in the back of our minds. I wouldn't have any problems with starting off FLIF 1.0 with the bare minimum (which I would consider FLIM+FLIF+EXIF+ICC) and expanding this as the need arises.

I agree that sRGB should be defined as the default color profile if no explicit profile is given, at least for 8-bit per channel images. For higher bit depths, I think perhaps Rec. 2020 should be defined as being the implicit default.

Strongly advise against this. Colorspace metadata, especially for untagged files, should not implicitly depend on characteristics of any particular encoding, or you end up with the situation that BT.601/BT.709 mismatch causes for the video world (e.g. user confusion about how resizing a 720p video to 480p suddenly changes all of the colors - which happens because most video players assume 720p and above = HD and otherwise use SD colorspace)

Especially the bit depth can easily change during the course of, for example, image processing - imagemagick, optipng etc. will happily reduce your 16 bit PNG to 8 bit PNG or vice versa.; doing so should not suddenly cause the colors to become extremely desaturated.

sRGB is a good default because it's used on the web and for small web graphics, where image size is a particular concern. If you're using BT.2020 content then you're probably sharing 4K movie clips or whatever, where the size of an extra 1 kB spent on an ICCv4 profile for BT.2020 hardly makes a difference.

Until the web's de-facto default (in the CSS standard and other places) moves from sRGB to something else, I don't think we should be worrying about treating any sort of file as BT.2020 when untagged.

tEXT for optional arbitrary comments (is this actually useful? Is this used for other things than messages like "Created with GIMP"?)

I don't know. I've never used them, I've never seen anybody use them, I didn't even know they exist. I don't think they serve any purpose that isn't already covered by EXIF. Happy to remove. It's probably better to use “nonstandard” chunks for this purpose either way, e.g. coPY for copyright information if you want to store it in images for whatever workflow reason. (I don't think this kind of metadata makes much sense when shared across workflows, so having them nonstandard is fine)

So let's start with these chunks:

Fair enough. EXIF and ICC profiles are both flexible enough to cover virtually every metadata need that I see an immediate need for.

Color profile is only needed if you're going to actually render the image (right?), so let's have it just before the actual data.

Right, but it doesn't really matter where exactly it's located as long as it's before the image data (so you can color correct partially rendered files).

Kroc commented 8 years ago

RE: 4-byte numbers -- you could always allow LEB128 numbers https://en.wikipedia.org/wiki/LEB128 these are 1+ bytes according to the scale of the number.

RE: EXIF compression -- instead of DEFLATE, could you not build dictionary of the most common EXIF words and bundle it with the en/decoder? (ala Brotli). Surely that would be better than LZ-alone as the dictionary doesn't have to ship with every file.

jonsneyers commented 8 years ago

LEB128 seems like a good idea since typical lengths will be low (e.g. 2 LEB128 bytes is enough for up to 16KB, which is enough for most color profiles and Exif data).

For higher bit depths, I think perhaps Rec. 2020 should be defined as being the implicit default.

Strongly advise against this. Colorspace metadata, especially for untagged files, should not implicitly depend on characteristics of any particular encoding

OK, I agree. It's just that if in the future, monitors with higher bit depth / wider gamut become more common, then maybe sRGB as an implicit default will start to look outdated. There is no real reason why you would only use the Rec. 2020 gamut for huge images, so the 1 KB overhead could be significant. But I suppose a better solution is to define some shorthands in the color profile chunk for standard/common color profiles like Adobe RGB, Rec. 2020 etc, sacrificing self-containedness to make it more space-efficient.

karlcz commented 8 years ago

Regarding physical size, the scientific imaging world often wants to give physical spacing for the grid, i.e. microscopy formats will often encode floating-point "microns per pixel" information somewhere in the metadata. This is given for each dimension since it can vary for multi-dimensional data.

karlcz commented 8 years ago

Regarding 32-bit lengths, I agree an individual segment shouldn't likely exceed this length, but I think a container definitely should be able to grow much larger including many image data segments for tiled and pyramidal storage, etc.

A lot of my comments might be irrelevant if you do not expect the FLIF container to be generalized to cover scientific imagery...

haasn commented 8 years ago

But I suppose a better solution is to define some shorthands in the color profile chunk for standard/common color profiles like Adobe RGB, Rec. 2020 etc, sacrificing self-containedness to make it more space-efficient.

Worst comes to worst I would implement a chunk that only exists to tag this image with an enum identifying it as as being Rec2020 instead of sRGB, assuming we arrive in a world where Rec2020 is the defacto standard and 1 kB of ICC overhead is still a concern. (I'm also not sure if that 1 kB profile is truly minimal, could possibly be made smaller - especially under compression). Doing this before it's necessary would be premature optimization IMO

Regarding physical size, the scientific imaging world often wants to give physical spacing for the grid, i.e. microscopy formats will often encode floating-point "microns per pixel" information somewhere in the metadata. This is given for each dimension since it can vary for multi-dimensional data.

Actually, can EXIF store this?

EDIT: Yes, it does. So maybe it would be best to just defer this to EXIF metadata.

A lot of my comments might be irrelevant if you do not expect the FLIF container to be generalized to cover scientific imagery...

I just think FLIF as a codec would probably be better suited going into a different container (like matroska or whatever) for different purposes. really, what we are dealing with here are two separate concerns: 1. the FLIF image format itself, which just encodes a 2D image, and 2. the “FLIM” lightweight container format which can store FLIF but doesn't strictly have to.

It's important to stress that FLIF images (i.e. the contents of the FLIF chunk) could very easily be stored in other containers, and that technically FLIM could also store something that isn't FLIF (assuming, for example, that FLIF is not the only chunk that terminates the file but that something like IDAT could do so as well - basically storing an image compressed as per PNG spec)

It's good to pair them together for the sake of implementation and standardization so there's a common way to exchange FLIF files on the internet, but for scientific purposes it would be easy to just rip out the FLIF algorithm and store it in something with more metadata or structure.

haasn commented 8 years ago

Note: It seems that XMP metadata is also supported by PNG etc. - PNG in particular stores this using iTXt using the keyword XML:com.adobe.xmp.

I'm not sure how popular it is, but it's cheap to support alongside EXIF so it might be best to just get it out of the way before somebody ends up needing to complain about it.

haasn commented 8 years ago

Concerning compression for EXIF/XMP/ICC etc: I think to be on the very safe side, I would just start off the 1.0 format by having an extra byte at the beginning of these chunks indicating the compression method used, with 00 meaning “no compression”. That gives us room to add more compression methods later on without breaking backwards compatibility.

(This wouldn't be for all chunks, just the ones mentioned)

Kroc commented 8 years ago

@haasn

(This wouldn't be for all chunks, just the ones mentioned)

Actually, I'd recommend extending this to all chunks so that any chunk can be upgraded in the future. You may want to allow compression of text chunks (archiving of OCR text with an image)

@jonsneyers

If you consider LEB128, use the variant employed by git -- it removes redundancy, increasing numerical range and preventing numbers from being encoded multiple different ways: https://en.wikipedia.org/wiki/Variable-length_quantity#Removing_Redundancy

edit: also, since a chunk length won't be zero, you can imply a +1 in the number, but I think you already worked that out.

haasn commented 8 years ago

Actually, I'd recommend extending this to all chunks so that any chunk can be upgraded in the future. You may want to allow compression of text chunks (archiving of OCR text with an image)

Fair enough, then I would just explicitly drop it from FLIM (since it's just a header which will presumably be some fixed/limited size struct/bitfield) and FLIF (for obvious reasons).

This would allow the minimal FLIF header to be FLIM<header>FLIF, or 8 bytes plus however many ints we store for metadata. Incidentally, what would the FLIM header actually contain? Width, height, colorspace? It might be a good idea to reuse the LEB128-variant for storing width and height as well, to allow FLIF files to grow larger than any arbitrary limit we would choose for them now, while simultaneously lowering the cost of a 1x1 pixel.

It might actually be a good idea to explicitly make sure we can store a 1x1 transparent FLIF image in a size comparable to that of a 1x1 transparent .GIF or .PNG, or even smaller.

edit: also, since a chunk length won't be zero, you can imply a +1 in the number, but I think you already worked that out.

I wouldn't rule that out, zero-length chunks may be useful (merely as indicating tags - such as PNG's sRGB). The one code word cost of carrying a coded 0 is not worth limiting the possibility of future growth.

Kroc commented 8 years ago

An 'empty' chunk could be indicated by a particular version number, i.e. CHNK, $FF. The reason you'd want to imply a +1 in the length is that exact power-of-2 boundaries are more useful than less-by-one. E.g. if you store exactly 16384 bytes in a chunk, that would use a 2-byte length, rather than the LEB boundary being at 16383 bytes.

edit:

It might actually be a good idea to explicitly make sure we can store a 1x1 transparent FLIF image in a size comparable to that of a 1x1 transparent .GIF or .PNG, or even smaller.

If the FLIM chunk had a version number too, this edge-case could be handled specifically. i.e.

FLIM $FF

would imply a 1x1 transparent image with all default attributes. (5 bytes)

jonsneyers commented 8 years ago

OK, I wrote a draft of the metadata spec: https://github.com/FLIF-hub/FLIF/commit/d162a42f0a980fab4456a0558d78e20ab8016852

Is the extra byte to indicate compression really needed? I would rather just use different chunk names if compression is used. E.g. iCCP for an uncompressed color profile, iCCZ for a zlib compressed one, etc. It's not like we're going to need 256 compression methods, and it's convenient if you can tell just from the chunk name if your decoder supports it or not. If we leave room for future compression methods, it can be the case that something that can now read iCCP (since only uncompressed iCCP is defined), can later suddenly no longer read all kinds of iCCP chunks.

Zero-length chunks are useful, e.g. the FLIM chunk is zero length.

As to LEB128, I used the variant which is used in BPG (http://bellard.org/bpg/bpg_spec.txt), that is, big-endian (not little-endian like LEB128), and without the redundancy-avoiding hack of git. This doesn't allow numbers to be encoded in different ways (i.e. the first byte is not allowed to be 0x80), but it does have some minor redundancy.

I get the point about exact power-of-2 boundaries and an implicit +1, but that's only going to make a (one byte) difference at 2^7, 2^14, 2^21 etc, while the cost is that zero-length chunks need 1 byte of padding. Is there any particular metadata that tends to have a size that is exactly one of those powers of two? As far as I know, ICC profiles, Exif and XMP all have more or less arbitrary sizes...

Kroc commented 8 years ago

@jonsneyers

while the cost is that zero-length chunks need 1 byte of padding

This is why I recommend a 1-byte version number for every chunk. It covers your back for forwards and backwards compatibility (tools that don't understand a version number can skip/blind-copy the chunk).

If every chunk has a version byte then, let's say, that a version of $00 indicates a null, empty chunk with no length. This way you're not having to add any additional bytes to indicate an empty chunk.

Also, the reason I like clean power-of-two LEB boundaries is that computer-generated data is more likely to use such boundaries (such as ROMs) and it may just help developers to not throw an off-by-one spanner in their way.

edit: and yes, compression can be separate from version (chunk name, vs. version number), well spotted

jonsneyers commented 8 years ago

It might actually be a good idea to explicitly make sure we can store a 1x1 transparent FLIF image in a size comparable to that of a 1x1 transparent .GIF or .PNG, or even smaller.

Haha, this reminds me of a blogpost I wrote a few weeks ago: http://cloudinary.com/blog/one_pixel_is_worth_three_thousand_words

The way I see it, if you want a minimal image, you don't put metadata on it at all, so there is no FLIM chunk, it just starts immediately with the FLIF chunk.

A 1x1 transparent image is 14 bytes at the moment, which is not too bad.

The image dimensions (and everything else that is needed to produce the uncompressed pixels) are already encoded in the FLIF chunk, no need to encode them also in the FLIM chunk.

Encoding the image dimensions with LEB128 (or rather UBEB128) is a good idea, it will save two bytes on icons that are smaller than 127x127 and theoretically allow sizes larger than 65535x65535 (if you have the memory to allocate an uncompressed image buffer that large). I already more or less committed to the FLIF header format (it has already been submitted to the maintainers of 'file', see https://github.com/file/file/blob/master/magic/Magdir/flif ), so I'm a bit hesitant to change this.

I don't know about adding version numbers to all chunks. In principle, it's a good idea, but in practice, it just adds an extra byte that is probably not going to be used. Nothing stops you from defining a new chunk which has a version number in its contents (and a CRC and compression method whatever other things you want to put in there), but in general, I want to avoid extra bytes. Four bytes just for the chunk name is already pretty generous (especially if we don't absolutely require them to be A-Za-z).

Most of the "important" metadata (image dimensions, color depth, number of frames, palette, etc etc) is already encoded inside the FLIF chunk, so I don't think we're going to need many extra chunks. I can think of only 3: color profile, XMP and Exif. Probably one byte to encode the chunk name is already enough to cover that (4 bits to encode the chunk semantics we encode with lower/uppercase, 4 bits to encode the chunk type), but just to be sure, we can use the full 4 bytes like in PNG, so there's plenty of room (268 million possible chunk names, ignoring the 4 signifier bits), and the extra advantage of having human-readable chunk names for the chunks we define.

haasn commented 8 years ago

Is the extra byte to indicate compression really needed? I would rather just use different chunk names if compression is used. E.g. iCCP for an uncompressed color profile, iCCZ for a zlib compressed one, etc. It's not like we're going to need 256 compression methods, and it's convenient if you can tell just from the chunk name if your decoder supports it or not. If we leave room for future compression methods, it can be the case that something that can now read iCCP (since only uncompressed iCCP is defined), can later suddenly no longer read all kinds of iCCP chunks

Presumably, if you allow iCCZ later on, then the spec will be that you should only ever have iCCZ or iCCP, but never both? If so, then there's no semantic distinction between that model and iCCP and iCCPZ (where ' ' and 'Z' are the compression bytes).

Otherwise, what would happen if you have both iCCP and iCCZ? And would you really expect software to write both the compressed and uncompressed versions of the ICC profile during the transition period?

haasn commented 8 years ago

I get the point about exact power-of-2 boundaries and an implicit +1, but that's only going to make a (one byte) difference at 2^7, 2^14, 2^21 etc, while the cost is that zero-length chunks need 1 byte of padding.

My advice is to avoid weird off-by-1s like this which serve no real purpose. FFmpeg did that a lot on its API to “save space” and whatnot, and all it has led to is more complicated code - especially because this API decision was reworked later on, so now software looks like

#if ABI_VER < N
size = struct.fieldPlus1 - 1
#else
size = struct field
#endif

Even if software will want to use a power of two for some piece of metadata for simplicity, they'll just have to spend an extra byte encoding the length - not a huge deal, and this doesn't apply to all of the chunks we have so far any way. (ICC, EXIF and text are all flexible length)

haasn commented 8 years ago

The image dimensions (and everything else that is needed to produce the uncompressed pixels) are already encoded in the FLIF chunk, no need to encode them also in the FLIM chunk.

Ah, okay. That explains some things - I was under the misimpression that FLIM would work like PNG's IHDR.

In that case, what do you think about moving this information out from FLIF and into FLIM? Rationale: You might want to know the image dimensions and colorspace right at the beginning of decoding. A scanning tool might want to figure out the image dimensions without even looking at the rest of the file; and especially if we ever support other colorspaces (e.g. XYZ or CMYK instead of RGB) that should probably be known before loading the ICC profile.

Aside: One approach to the alternate-colorspace world would be to require CMYK images to have an ICC profile that indicates their source space as CMYK, rather than as RGB. That way we wouldn't have to tag it in the header at all, and we'd get support for arbitrary colorspaces “for free” as long as FLIF supports both 3- and 4-channel images.

Finally: To cut down on the overhead, especially of a 1x1, it might be a good idea to make the format work more like this:

FLIF chunk: image width/height, number of channels, number of planes/images, other metadata...
other chunks like iCCP, eXIF, tEXT
0 byte
Image data (current FLIF file sans the information we extracted)

That way, a minimal 1x1 sample would only be one byte larger than it is currently, and there would be no confusing logic like starting a file off with either FLIM or FLIF depending on whether there are metadata chunks or not. (Which I think would be a mistake)

I already more or less committed to the FLIF header format (it has already been submitted to the maintainers of 'file', see https://github.com/file/file/blob/master/magic/Magdir/flif ), so I'm a bit hesitant to change this.

Ah, that's somewhat unfortunate. Maybe it would be a good idea to distinguish between the “old” (current) FLIF and the “new” format we are creating in this thread by name, so that you don't have to undo something you already submitted but instead just give it a new name.

jonsneyers commented 8 years ago

I don't think it's a big deal to change the FLIF header format if needed, there will have to be modifications anyway if FLIM also becomes a valid magic number.

there would be no confusing logic like starting a file off with either FLIM or FLIF depending on whether there are metadata chunks or not. (Which I think would be a mistake)

I'm not sure about this. On the one hand, having only one magic value (only "FLIF", not "FLIF" and "FLIM") is nice. On the other hand, it's kind of nice to easily distinguish a FLIF-with-metadata from a "naked FLIF", and it makes adding/stripping metadata really easy. If you quickly want to check whether something might be a Free Lossless Image, you just read the first three bytes and see if they're "FLI".

Moving the basic image info (image dimensions / color depth) to the first chunk (regardless of whether it is FLIM or FLIF) might be a good idea though, for the reason you mention.

haasn commented 8 years ago

and it makes adding/stripping metadata really easy

I think any tool designed to add/strip metadata would be going by the chunk “tags” either way (i.e. whether it's critical or not) depending on the scope and purpose of the tool.

Besides, for a presumably dedicated tool written for this job, is it really that much easier to scan for “FLIF” and drop everything before it than it is to just iterate through the headers and omit the ones it doesn't need? It's not something you're going to be doing with standard command line tools either way, I think.

eeeps commented 8 years ago

Is that FLIF segment an atomic unit for decoding or does it have internal offsets you would want to locate to decode portions of the image space?

Atomic (there is no way to decode a random crop without encoding the entire thing), though if it's an interlaced image, then it could be useful to know the offsets of the zoomlevels, in case you want to progressively load the image, perhaps not to full resolution. However, I don't think it helps much to put this information in the file itself, e.g. if this is going to be used to let a browser do specific Range requests to get a truncated file depending on e.g. the device DPI, then it should ideally know the offset before doing the request. So it should be in the HTML or truncated server-side based on HTTP Client Hints or something.

Yoav Weiss is probably the person who’s thought the most about container formats and responsive loading; his post about it is worth a read and his tool is worth a look.

I’m a little out of my depth here, but I’d argue that if responsive loading is an important use case for FLIF (which it appears to be), the information needed to do it should be wrapped up tightly with the image data itself. Duplication of information across different layers of the stack = multiple sources of truth and sadness.

This information should not go into HTML, which is often still written by hand. Developers mess up w descriptors and sizes attributes constantly, because those features ask them to duplicate information about (1) their resources and (2) their layout, in HTML. People also complain about the verbosity of srcset... which led to Client Hints. Putting byte offsets in HTML would be awful.

Should byte offsets go on the server, so that the server can respond intelligently based on facts about the browsing environment, maybe delivered via, say, Client Hints? Yes! And the best place to put those byte offsets, on the server, is as close to the image data as possible. Ideally: inside of the same container. And ideally, those offsets would be written by the same encoder that wrote the original data, at encode time - not some other tool, coming in later, more susceptible to error.

FLIF-hub / FLIF

Metadata (EXIF, color profiles, etc etc) #17