Feedback on draft Amendment 3 for Image File Format

bradh commented 4 months ago

Comments on working draft at https://www.mpeg.org/wp-content/uploads/mpeg_meetings/146_Rennes/w23988.zip

For the version 1 of PixelInformationProperty (pixi):

it might be slightly cleaner if channel_data_type used the same enumeration values as 23001-17 Table 2 (plus AMD 2, which adds 3 for signed integer). Interestingly pixi bits_per_channel doesn't have enough range for the extreme case in 23001-17. If we'd called it bits_per_channel_minus_one we could have got there, but a bit late to change the semantics.
I also couldn't find the right part in ISO/IEC 23091-2 for the definition for GenericSubsamplingType and GenericSubamplingSampleLocType, and assume they might be in a CD somewhere. Possibly needs an additional reference.
In the semantics for channel_idc: "Values 3-8 are reserved for future use." That should say "Values 3-7 are reserved...", since there are only three bits.

For the compressed meta box:

It seems like there is a lot of finessing to make it still be 'meta', including needing removing restrictions in ISOBMFF that an implementer could potentially have relied on. I don't think that effort is worth it, and it would be cleaner to just call it "met2" or "cmet" or something else. It is so different in terms of syntax, it'll probably be cleaner just to parse it separately anyway. A consumer can still synthesise an in-memory MetaBox if the implementation needs that, and there is no reason why it can't use other boxes (i.e. whatever is allowed for MetaBox), although I'd like to see a limitation that prevents inconsistency.
Possibly there could actually be two compact versions - a really simple one that doesn't allow much (maybe just image + alpha), and is well suited to simple web page images, and a second one with the full tools.

In P.2, the statement that 23008-12 is a Publically Available Standard on the ITTF page is (sadly) no longer true. While that might still be in work, perhaps we should just reference "ISO/IEC 23008-12" and be silent on the location. Updating the other MIME type registrations can be left for a future contribution.

y-guyon commented 4 months ago

Thank you for the feedback.

If we'd called it bits_per_channel_minus_one we could have got there, but a bit late to change the semantics.

Too late because we are at AMD WD stage or because it would be too confusing compared to 'pixi' version 0?
If doable I think it is a reasonable change.

I also couldn't find the right part in ISO/IEC 23091-2 for the definition for GenericSubsamplingType

Same. Should we keep only Rec. ITU-T H.273 as reference or are we confident enough that ISO/IEC 23091-2 will be updated before this amendment draft is published?

In the semantics for channel_idc: "Values 3-8 are reserved for future use." That should say "Values 3-7 are reserved...", since there are only three bits.

Good catch.

It seems like there is a lot of finessing to make it still be 'meta'

I agree.

Possibly there could actually be two compact versions - a really simple one that doesn't allow much (maybe just image + alpha), and is well suited to simple web page images, and a second one with the full tools.

"The second one with the full tools" would be regular HEIF, right?
Otherwise, introducing an alternative representation of HEIF is already a big change. Introducing two would be confusing in my opinion.

In P.2, the statement that 23008-12 is a Publically Available Standard on the ITTF page is (sadly) no longer true

Right.

farindk commented 4 months ago

it might be slightly cleaner if channel_data_type used the same enumeration values as 23001-17 Table 2 (plus AMD 2, which adds 3 for signed integer).

This, and we're also missing the 'complex' data type from 23007-17. That could be the reserved '3', but then we are already out of space. Thus, it might be good to increase the size of channel_data_type by one bit. There are still two reserved bits left.

bradh commented 4 months ago

"The second one with the full tools" would be regular HEIF, right? Otherwise, introducing an alternative representation of HEIF is already a big change. Introducing two would be confusing in my opinion.

I was thinking the met2 (or whatever we call it) with version=0 for the absolute minimum that we can agree on; then version=1 with the rest of the tools.

bradh commented 4 months ago

If we'd called it bits_per_channel_minus_one we could have got there, but a bit late to change the semantics.

Too late because we are at AMD WD stage or because it would be too confusing compared to 'pixi' version 0? If doable I think it is a reasonable change.

My original assessment was based on version 0 already being defined and widely implemented and version 1 inheriting all of version 0. A different design could definitely be an option if that didn't contravene some rules I don't understand.

leo-barnes commented 4 months ago

Worth adding to this discussion is that from what I can see is that libheif currently does not check that the version of the meta box is 0. So a file with a v1 meta box will be parsed as if it was a meta v0. For a real file this would then immediately return an error since the parser can't find the hdlr box.

farindk commented 4 months ago

Worth adding to this discussion is that from what I can see is that libheif currently does not check that the version of the meta box is 0.

Should be easy to add once I have the final specification.

leo-barnes commented 4 months ago

Worth adding to this discussion is that from what I can see is that libheif currently does not check that the version of the meta box is 0.

Should be easy to add once I have the final specification.

Right, this was more to test that having a v1 meta would not cause existing implementations to explode. Both the Apple parser and libavif were strict about the meta box being v0, but libheif did not seem to check it. But since you then immediately look for the hdlr box it doesn't really cause any issues for this use-case.

y-guyon commented 4 months ago

The meta v1 first bytes could look like "hdlr" though, but keeping consistency with box sizes, offsets and codec configuration is very unlikely.

farindk commented 4 months ago

Worth adding to this discussion is that from what I can see is that libheif currently does not check that the version of the meta box is 0.

Should be easy to add once I have the final specification.

libheif now checks all box versions: https://github.com/strukturag/libheif/commit/2de879aadb52c7d2f661c0067faa6b853330a9f1

y-guyon commented 4 months ago

it might be slightly cleaner if channel_data_type used the same enumeration values as 23001-17 Table 2 (plus AMD 2, which adds 3 for signed integer)

Reassigning the enum values in AMD3 to match 23001-17 sounds good to me.

This, and we're also missing the 'complex' data type from 23007-17.

What is the purpose of the 'complex' data type? I must admit I fail to see its use cases in a SlimHEIF context.

bradh commented 4 months ago

Complex is useful in a range of places. Something like Synthetic Aperture Radar is one example. Reducing overhead in transfers is still good, even if the data is large per sample. Applying lossless compression on a spatial window (cropped area) and then having minimum meta would still help when bandwidth is limited or latency is critical.

mhannuksela commented 4 months ago

It seems like there is a lot of finessing to make it still be 'meta'

MPEG's reasons to use 'meta' v1 included keeping fundamental design choices that have existed in HEIF from the beginning. HEIF requires a MetaBox at file level.

mhannuksela commented 4 months ago

pixel_format: specifies the number of bits per channel and the presence of a decimal point for the pixels of the reconstructed image. Values 0, 1, and 2 correspond to half-precision float (binary16), single-precision float (binary32), and double-precision float (binary64) formats as defined by IEEE 754-2008, respectively. Other values correspond to the bit depth minus one per channel of unsigned integer samples.

Is it reasonable to specify floating point pixel formats in this amendment? Which codecs would use it? If floating point becomes important, couldn't support for it be added in a future version?

leo-barnes commented 4 months ago

@mhannuksela

Is it reasonable to specify floating point pixel formats in this amendment? Which codecs would use it? If floating point becomes important, couldn't support for it be added in a future version?

Given how tricky it will be to add stuff in the future, we wanted this to capture anything we think is important now and for the next couple of years. We tried to look at features that are currently supported by HEIF and HEIF codecs but we also took a look at features that are commonly noted as shortcomings of HEIF compared to other formats or features supported by codecs that currently does not have HEIF bindings but could easily have.

Two things popped up as both desirable and likely:

Native alpha support (codec supports RGBA natively rather than coding alpha as a separate plane)
Floating point

Floating point is supported by for example JPEG XL. I don't find it unreasonable to think JPEG XL might get HEIF bindings at some point. We wanted the reduced overhead mode to be able to support this from the start if/when that happens.

podborski commented 4 months ago

Worth adding to this discussion is that from what I can see is that libheif currently does not check that the version of the meta box is 0.

Should be easy to add once I have the final specification.

Right, this was more to test that having a v1 meta would not cause existing implementations to explode. Both the Apple parser and libavif were strict about the meta box being v0, but libheif did not seem to check it. But since you then immediately look for the hdlr box it doesn't really cause any issues for this use-case.

This is one of the issues I have with this. There is no guarantee that legacy implementation will treat it nicely as the versioning is completely ignored in most cases and the parser will be trying to parse boxes and not some payload. Meta was designed to be a container box in the first place. (see QTFF). this being a FullBox in ISOBMFF was simply a mistake.

Here is the behavior of the libisomedia (file format reference software):

meta v1: MP4OpenMovieFile function fails with MP4BadDataErr.
mini: MP4OpenMovieFile succeeds and returns the MP4NoErr. It simply dumps mini as an unknown box.

also by implementing support for slimHEIF you would need:

for meta v1: you have to modify your MetaBox handling routines
for mini: just implement parsing of the new box. While the elements of it are parsed a 'virtual' Meta v0 box is created in memory and filled out based on parsed parameters.

From the implementation perspective, mini is much cleaner in my opinion.

In addition to these issues. The way how meta v1 is specified is also confusing. We need to modify not only HEIF but also make a change to ISOBMFF telling that, "while we don't allow a derived specs to change the meaning of our boxes, there can be exceptions if a derived specs can ask for this" (or something like that).

All this is for what reason again? so we satisfy some definition statements of an item file?

MPEG's reasons to use 'meta' v1 included keeping fundamental design choices that have existed in HEIF from the beginning. HEIF requires a MetaBox at file level.

There is no problem with mini regarding this. In the section where mini is defined we say that it is translated to a MiniBox in the memory (see above on how to implement). There is no issue. meta v0 is always there after you parsed mini.

In my opinion meta v1 is breaking the fundamental design of ISOBMFF in the first place. A container box is all of a sudden a leaf box containing pure payload data.

y-guyon commented 4 months ago

Complex is useful in a range of places. Something like Synthetic Aperture Radar is one example.

Thank you for the example.
In my opinion this is no longer an image but custom data to be processed by custom implementations.
I do not think it should be added to SlimHEIF for now but I do not mind if others think it is worth it.

Reducing overhead in transfers is still good, even if the data is large per sample. Applying lossless compression on a spatial window (cropped area) and then having minimum meta would still help when bandwidth is limited or latency is critical.

I agree. Fewer bytes is always better in my opinion. But it was requested in MPEG 146 to keep the SlimHEIF design to its main use cases (small files, simple images).

kashyapks commented 4 months ago

This is one of the issues I have with this. There is no guarantee that legacy implementation will treat it nicely as the versioning is completely ignored in most cases and the parser will be trying to parse boxes and not some payload. Meta was designed to be a container box in the first place. (see QTFF). this being a FullBox in ISOBMFF was simply a mistake.

We define brands to gate the behaviour of readers, so with the new brands it is expected the readers should fail to parse the Meta with version=1.

Here is the behavior of the libisomedia (file format reference software):

meta v1: MP4OpenMovieFile function fails with MP4BadDataErr. mini: MP4OpenMovieFile succeeds and returns the MP4NoErr. It simply dumps mini as an unknown box.

It is expected that ISOBMFF readers/parsers gracefully handle the boxes which they cannot interpret. If libisomedia does not handle the box gracefully we should try to fix it, but that is independent of the discussion here.

also by implementing support for slimHEIF you would need:

for meta v1: you have to modify your MetaBox handling routines for mini: just implement parsing of the new box. While the elements of it are parsed a 'virtual' Meta v0 box > is created in memory and filled out based on parsed parameters. From the implementation perspective, mini is much cleaner in my opinion.

with meta v1 you just have to parse the slimHEIF parameters under version 1 with mini you will have to modify the file-level handling routines

In addition to these issues. The way how meta v1 is specified is also confusing. We need to modify not only > HEIF but also make a change to ISOBMFF telling that, "while we don't allow a derived specs to change the > meaning of our boxes, there can be exceptions if a derived specs can ask for this" (or something like that).

All this is for what reason again? so we satisfy some definition statements of an item file?

We do not see how the Meta v1 changes the meaning of the box. The changes in ISOBMFF spec clearly clarifies the content in Metav1 without changing its meaning. The derived specification is only allowed to define the data without changing the meaning of the box. If you feel the current specification is limited we can clarify it further.

In my opinion meta v1 is breaking the fundamental design of ISOBMFF in the first place. A container box is all of a sudden a leaf box containing pure payload data.

The meta is a FullBox and we simply use the functionality and do not see how we break ISOBMFF design

leo-barnes commented 4 months ago

@kashyapks

with meta v1 you just have to parse the slimHEIF parameters under version 1 with mini you will have to modify the file-level handling routines

You keep talking about how meta v1 is easier to implement all the time, but you've now heard disagreement from the 3 most widely used HEIF implementations (libheif, libavif, the Apple parser). This is simply not a valid argument if the implementers disagree.

mhannuksela commented 4 months ago

@leo-barnes

Given how tricky it will be to add stuff in the future, we wanted this to capture anything we think is important now and for the next couple of years. We tried to look at features that are currently supported by HEIF and HEIF codecs but we also took a look at features that are commonly noted as shortcomings of HEIF compared to other formats or features supported by codecs that currently does not have HEIF bindings but could easily have.

Thanks for the thorough answer. Being future-proof is generally a good idea.

I wonder how precisely the floating point format needs to be specified. I can understand the different bit-depths, since that would indicate the size of the decoded image memory buffers. But I wonder if IEEE 754-2008 specifically needs to be mentioned and why. Would a HEIF implementation become non-conforming if it did not support IEEE 754-2008 but some floating point representation natively available in its environment? If referring to IEEE 754-2008 specifically is important for some reason, should byte-endianness also be specified? If so, does IEEE 754-2008 specify byte-endianness or should HEIF specify it?

mhannuksela commented 4 months ago

On the topic of being future-proof, I am a bit worried that ISO 21496-1 is hard-coded as the only way of HDR-related tone mapping. ISO 21496-1 is an ongoing standardization effort, so yet unclear how widely used it will become. It would be good to enable signalling of other tone-mapping methods too.

y-guyon commented 4 months ago

If referring to IEEE 754-2008 specifically is important for some reason, should byte-endianness also be specified?

I do not think so.
To me, referring to IEEE 754-2008 allows HEIF to specify the possible values that can be output, more than how they are represented in memory. Signaling the bit depth serves the same purpose.
On the other hand, endianness is an implementation detail and does not need to be mentioned by HEIF.

If the implementation outputs the same values in another representation, it is conforming. Any lossless conversion is just software plumbing. Another example would be 10-bit samples that are often output using 16-bit types such as uint16_t; it does not make the decoder non-conforming because the data is not stored using some hypothetical uint10_t type.

It would be good to enable signalling of other tone-mapping methods too.

Could you cite some?
Maybe they can be exactly represented with ISO 21496-1.

leo-barnes commented 3 months ago

@mhannuksela

On the topic of being future-proof, I am a bit worried that ISO 21496-1 is hard-coded as the only way of HDR-related tone mapping. ISO 21496-1 is an ongoing standardization effort, so yet unclear how widely used it will become. It would be good to enable signalling of other tone-mapping methods too.

Like @y-guyon mentioned there are no other gain-map based tone-mapping methods I'm aware of that. The JPEG WG has done similar stuff, but not anything that is container/codec-agnostic and something that applies to HEIF. Even if there were, they would be multiple years into the future given that we have seen no HEIF proposals for them as of yet. At which point 21496-1 will be the de-facto way of doing it in HEIF and the way that will be most widely adopted.

The reduced overhead functionality is really targeting things that we know or expect to use in the relatively near future. We know 21496-1 is going to be used widely for HEIF, hence why it makes sense to add it from the start.

And as you know, the extended_meta functionality that we had in our earlier proposals would have supported any kind of tone-mapping supported by HEIF, but we had to remove that in order to move this project forward.

bradh commented 3 months ago

Instead of IEEE 754-2008, we could refer to a later version (e.g. 2019). Or to the equivalent ISO/IEC 60559:2020. I'm not aware of any other common interchange formats, and while I am not fully across all the codecs, the only one I am familiar with that does floating point is 23001-17, and it calls out 754-2008 (with codec-specified endianess).

podborski commented 3 months ago

This is one of the issues I have with this. There is no guarantee that legacy implementation will treat it nicely as the versioning is completely ignored in most cases and the parser will be trying to parse boxes and not some payload. Meta was designed to be a container box in the first place. (see QTFF). this being a FullBox in ISOBMFF was simply a mistake.

We define brands to gate the behaviour of readers, so with the new brands it is expected the readers should fail to parse the Meta with version=1.

No this will not solve the problem I mentioned above. What you are doing is that you are re-defining an ancient structure that has been deployed over many many years in a LOT of places most likely without versioning consideration in the first place. Some of this due to the fact that this was actually a bug and QTFF has it as a simple box.

Also while brands are very useful, often the behavior of their interpretation is not the same as you would expect.

Here is the behavior of the libisomedia (file format reference software):

meta v1: MP4OpenMovieFile function fails with MP4BadDataErr. mini: MP4OpenMovieFile succeeds and returns the MP4NoErr. It simply dumps mini as an unknown box.

It is expected that ISOBMFF readers/parsers gracefully handle the boxes which they cannot interpret. If libisomedia does not handle the box gracefully we should try to fix it, but that is independent of the discussion here.

This is a perfect demonstration of what can happen when you mess up MetaBox. libisomedia is one example, there could be many many more out there.

also by implementing support for slimHEIF you would need:

for meta v1: you have to modify your MetaBox handling routines for mini: just implement parsing of the new box. While the elements of it are parsed a 'virtual' Meta v0 box > is created in memory and filled out based on parsed parameters. From the implementation perspective, mini is much cleaner in my opinion.

with meta v1 you just have to parse the slimHEIF parameters under version 1 with mini you will have to modify the file-level handling routines

Are you now arguing that implementing a new box is more complex? I don't understand your point here.

In addition to these issues. The way how meta v1 is specified is also confusing. We need to modify not only > HEIF but also make a change to ISOBMFF telling that, "while we don't allow a derived specs to change the > meaning of our boxes, there can be exceptions if a derived specs can ask for this" (or something like that).

All this is for what reason again? so we satisfy some definition statements of an item file?

We do not see how the Meta v1 changes the meaning of the box. The changes in ISOBMFF spec clearly clarifies the content in Metav1 without changing its meaning. The derived specification is only allowed to define the data without changing the meaning of the box. If you feel the current specification is limited we can clarify it further.

I don't know how you don't understand it. It is defined as a pure container box. All of the sudden it has payload in it. What is so difficult to understand here? No further clarification needed. just change meta v1 to mini, this is it. Make the life easier for everyone.

In my opinion meta v1 is breaking the fundamental design of ISOBMFF in the first place. A container box is all of a sudden a leaf box containing pure payload data.

The meta is a FullBox and we simply use the functionality and do not see how we break ISOBMFF design

See the point above. But what you really do here is overcomplicating the design by impacting backwards compatibility, for no technical reason. In my opinion this is done only because Nokia wants it so. I really hope more people express their opinion here.

bradh commented 3 months ago

No this will not solve the problem I mentioned above. What you are doing is that you are re-defining an ancient structure that has been deployed over many many years in a LOT of places most likely without versioning consideration in the first place. Some of this due to the fact that this was actually a bug and QTFF has it as a simple box.

Concur. Note that Android makes ISOBMFF with meta as a simple box (depending on location in the file). They know that is non-conformant, but don't want to change it. I disagree with them, but respect the position: real world deployment matters. Conformance is not important to users - what is important is interoperability.

MetaBox is already overloaded enough. Please do not make it more complicated.

jeanlf commented 3 months ago

My 2 cents on this:

Meta support is already very tricky, having to deal with QTFF and ISOBMFF variants. A design based on meta v1 will only make things more complicated. We will take the risk of players thinking they can process the file while they can't
It seems strange to go so far in bit optimization but waste 8 bits of version
I personally find it quite disturbing to say "this is a new version" when this is a completely new binary syntax
The text Derived specifications are not permitted to declare a new version of a box defined in this document, but may define the syntax of a version of a box when this document explicitly allows derived specifications to do so opens the pandora box: I can now define my own MetaBox V1 in my own "derived specification" with yet another completely different content from HEIF meta v1. Players will for sure bark at GPACMeta IsoBMFF derived spec files but they will be compliant !
I also believe the spec is really unclear on what is an ISOBMFF file in the case of item: for movie or fragments files, the required top-level(s) derive from box so no problem, but for meta the required top-level is a full box (with only version 0 defined so far), so as of today only a v0 top-level metabox makes the file an 'item file'

mhannuksela commented 3 months ago

@leo-barnes @y-guyon

Like @y-guyon mentioned there are no other gain-map based tone-mapping methods I'm aware of that. The JPEG WG has done similar stuff, but not anything that is container/codec-agnostic and something that applies to HEIF. Even if there were, they would be multiple years into the future given that we have seen no HEIF proposals for them as of yet. At which point 21496-1 will be the de-facto way of doing it in HEIF and the way that will be most widely adopted.

The reduced overhead functionality is really targeting things that we know or expect to use in the relatively near future. We know 21496-1 is going to be used widely for HEIF, hence why it makes sense to add it from the start.

And as you know, the extended_meta functionality that we had in our earlier proposals would have supported any kind of tone-mapping supported by HEIF, but we had to remove that in order to move this project forward.

I am not an expert on gain maps and don't know how many formats there are out there. I've heard about JPEG having their method (but maybe that is specific to JPEG codec(s) only) and I know Adobe has their own gain map format. I was simply worried about the hard-coding of ISO 21496-1. Leaving hooks for potential future extensions to other gain map formats would be good.

y-guyon commented 3 months ago

I've heard about JPEG having their method (but maybe that is specific to JPEG codec(s) only)

Would that be what we call "Ultra HDR"?
https://developer.android.com/media/platform/hdr-image-format#HDR_gain_map_metadata
That looks like very similar to the GainMapMetadata struct in ISO 21496-1, with some fields renamed (for example HDRCapacityMin should map to base_hdr_headroom or alternate_hdr_headroom).
Assuming you were talking about "Ultra HDR" and that it can be exactly represented by ISO 21496-1, it is unnecessary to leave a hook for it in SlimHEIF.

I was simply worried about the hard-coding of ISO 21496-1.

Hard-coding reduced use cases is the whole point of SlimHEIF in my opinion.
This is how bytes are saved: by making it impossible to signal unexpected patterns and values.

Leaving hooks for potential future extensions to other gain map formats would be good.

That would have been an option with the extended_meta field that was removed at MPEG 146.

mhannuksela commented 3 months ago

Leaving hooks for potential future extensions to other gain map formats would be good.

That would have been an option with the extended_meta field that was removed at MPEG 146.

Hooks don't have to be as expressive as extended_meta. For example, you may have a flag, 0 = ISO 21496-1, 1 = other gain map format; and when equal to 1, say a unsigned int(8) gain_map_format_id, whose value currently is reserved.

leo-barnes commented 3 months ago

Hooks don't have to be as expressive as extended_meta. For example, you may have a flag, 0 = ISO 21496-1, 1 = other gain map format; and when equal to 1, say a unsigned int(8) gain_map_format_id, whose value currently is reserved.

Sorry, but I really don't see the point. We know of no other gain-map related work that is relevant. The Adobe white paper (on which Ultra HDR is based) and the Apple proprietary gain-maps are what 21496-1 is trying to standardize. Both the white paper and the Apple proprietary gain-maps can basically be considered deprecated at this point since both Adobe and Apple are in full support of 21496-1.

y-guyon commented 3 months ago

Hooks don't have to be as expressive as extended_meta. For example, you may have a flag, 0 = ISO 21496-1, 1 = other gain map format; and when equal to 1, say a unsigned int(8) gain_map_format_id, whose value currently is reserved.

ISO 21496-1's GainMapMetadata struct starts with a GainMapVersion version field which is similar in functionality. It would stay within ISO 21496-1 domain but allows for future versions (kind of like ICC profiles). Would that be a sufficient hook?

y-guyon commented 6 days ago

All items were included in WD of HEIF amendment 2 or were discussed into no further action needed.

MPEGGroup / FileFormat

Feedback on draft Amendment 3 for Image File Format #103