AcademySoftwareFoundation / openexr

The OpenEXR project provides the specification and reference implementation of the EXR file format, the professional-grade image storage format of the motion picture industry.
http://www.openexr.com/
BSD 3-Clause "New" or "Revised" License
1.62k stars 608 forks source link

Embedding "binary blobs" in the header #1497

Open lrosenthol opened 1 year ago

lrosenthol commented 1 year ago

I am the chair of the Technical Working Group at the Coalition for Content Provenance and Authenticity (C2PA, https://c2pa.org). The C2PA specification serves to enable the establishing of provenance for assets, especially now with the need to identify when content has been generated or modified by AI/ML.

We've had requests from our members to add C2PA data to EXR fils. In reviewing the EXR "format" to determine how one would do so, my reading seems to show that the standard way to add new/custom information is to add it as a custom attribute to the header. Is that a correct interpretation?

However, based on my reading, entries in the header can only be integers, floats, strings or arrays of the same. It doesn’t appear to support “arbitrary binary data” – which is what we would need as our data is stored in a structured binary "blob" based on ISO 19566-5 (JUMBF).

Our specification does support sidecar data as well, though we prefer it be embedded (for all the obvious reasons). Is there any guidance on how to enable this?

lgritz commented 1 year ago

Hi, what a great coincidence!

I was just learning about C2PA for the first time at CVPR last month, where I has some extended discussion with Andy Parsons and John Collomosse. I meant to follow up on all this, but hadn't quite gotten to it yet, so I'm very happy for this to serve as a reminder.

I think the goals of C2PA are really important and would like our project to help in any way possible. We definitely want support in OpenEXR, and I think if that means we need to extend the file format and APIs to support binary blobs in general, or even having a standard attribute to house the C2PA data in particular, I think it's a no-brainer that we should do so.

In addition to being on the OpenEXR TSC, I'm also the chief architect of the OpenImageIO project, and in that capacity I want to make sure that the OIIO APIs and infrastructure is properly supporting transport of this metadata. OpenEXR is a common file format, but we also need to cover all the software that exists between reading an exr file, doing all sorts of things to the images (including sometimes ending up in other file formats), and eventually producing new outputs, and if that software is either failing to propagate the C2PR metadata, or if they are modifying the image in ways that invalidate the C2PR or need to augment it, that all needs to happen as well. For a wide variety of visual effects studios and products, OpenImageIO is the right place to implement that logic so that it will be applied automatically across a lot of applications and pipelines.

lgritz commented 1 year ago

Continuing...

I think having a way to store binary blobs, while also a good addition, is not ambitious enough. Let's make C2PA metadata a first class idea in OpenEXR and help promote it, not merely make it possible to cram in.

We're pretty busy with preparation for SIGGRAPH in a couple weeks, but maybe I'd like to propose that in the weeks following SIGGRAPH, you could come to one of the OpenEXR technical steering meetings (it's on Zoom and open) to give a short overview for those who aren't familiar, what C2PA is, the problem it's designed to solve, and where the project currently stands. Then we can brainstorm a bit about how to proceed. If it's not too intrusive, we might just have barely enough time to squeeze support in for this year's OpenEXR 3.2 release.

kmilos commented 1 year ago

FYI, the GIMP added it's own binary blob attribute to EXR quite some time ago in order to store Exif metadata for example (and darktable matched it so the two can exchange).

It would be neat if such an attribute became part of the spec anyway.

lgritz commented 1 year ago

Why TF wouldn't they upstream that?

lgritz commented 1 year ago

Also, with arbitrary named metadata allowed in OpenEXR, there isn't a good reason to put Exif in a binary blob. You can literally add metadata items individually as "Exif:LensMake" or the like.

kmilos commented 1 year ago

there isn't a good reason to put Exif in a binary blob

Quite the opposite. Most other apps parse Exif data from its canonical TIFF container through 3rd party metadata libraries (I'm sure you're aware of how it's carried in the JPEG, PNG, HEIF, JPEG XL, etc.). Why would you want to write yet another Exif parser to break it out and then have to parse it again to pack it anew in a TIFF blob for interchange?

P.S. From a non-movie industry point if view, it's a PITA enough already to have to translate between Exif and the corresponding subset of OpenEXR standard attributes...

lrosenthol commented 1 year ago

@lgritz Sounds like we are indeed on the same page - both for OpenEXR as well as OIIO.

I would welcome the opportunity to attend a meeting and present on our work. Let me know when you and the rest of the folks are back and able to meet!

cary-ilm commented 1 year ago

@lrosenthol, please do join us, our steering committee meetings are open discussions. We meet bi-weekly on Thursdays, 1:30pm Pacific, and you're welcome today although we do have some other urgent business, and the next meeting likely won't happen because of SIGGRAPH, so the next available time is probably August 24. The calendar with Zoom link is here: https://lists.aswf.io/g/openexr-dev/calendar

lrosenthol commented 1 year ago

August 24 works for me! I'll put it on my calendar. Thanks @cary-ilm

lgritz commented 1 year ago

there isn't a good reason to put Exif in a binary blob

Why would you want to write yet another Exif parser to break it out and then have to parse it again to pack it anew in a TIFF blob for interchange?

Because the TIFF blob is just the way (in fact, just ONE way) of serializing it to cram into a file, particularly in a format that doesn't support arbitrarily named and typed metadata. So they just throw their hands up and make a blob. That's the right thing to do for the file formats that require it be stored like this, but OpenEXR explicitly allows the metadata items to be individually named and manipulated, it's the "native" or idiomatic way to do it in an OpenEXR file.

P.S. From a non-movie industry point if view, it's a PITA enough already to have to translate between Exif and the corresponding subset of OpenEXR standard attributes...

I 100% agree that we should have a guide for which Exif items correspond to which standard or SMPTE metadata names that we've defined, to have consistent interchange as we move between formats.

cary-ilm commented 1 year ago

@kmilos, would you consider turning The Gimp's binary blob attribute into an official submission to OpenEXR? We'd welcome the addition, even aside from the C2PA issue, and compatibility with your existing scheme would be helpful.

kmilos commented 1 year ago

would you consider turning The Gimp's binary blob attribute into an official submission to OpenEXR?

CC the original author @hanatos

hanatos commented 1 year ago

hi all. if i understand correctly, kmilos is talking about this darktable patch from 2012 that made it into gimp, resulting in today's leftovers here.

Why TF wouldn't they upstream that?

oh, i'm not sure this workaround is something to be proud about. if i understand the code comments around it nowadays correctly i'll assume the preferred way of working with metadata in exr is to go through libexiv2, as with all the other files. this way, the library could implement the path through openexr metadata items (which seems to be inline with the discussion about first class citizen features above).

from that perspective i understand that nobody wants blobs (also not in openexr). the pragmatic aspects are that 1) libexiv2 is probably not going to support native openexr metadata in another ten years and 2) as you know, the metadata for raw images is often ill-specified (vendor specific maker notes with the occasional surprise) and essential to the development process.

i might be able to drop by the openexr bof next monday to say hi.

kmilos commented 1 year ago

@hanatos The problem is that Exiv2 doesn't support EXR files at all (not sure even ExifTool does?), and EXR doesn't have any specific provisions for Exif metadata. So the binary blob still seems to be a pragmatic way to disentangle this...

lgritz commented 1 year ago

It does have provisions for Exif data, the same it has for all other metadata: you enter each item with the right name and type. Simply embedding an opaque binary blob with TIFF tag encoding is a very heavy-handed approach that is not idiomatic OpenEXR.

lgritz commented 1 year ago

As an example, OpenImageIO has no trouble whatsoever doing a round trip of JPEG -> OpenEXR -> JPEG (or TIFF, or whatever) while preserving all the Exif metadata.

peterhillman commented 1 year ago

EXIF carries a lot of maker-specific esoteric data that's useful to carry in a pipeline for tracking. Running a JPEG I have through exiftool lists the state of each autofocus sensor. Round-tripping that between OpenEXR attributes and EXIF tags will be tricky, since both reader and writer need to understand every piece of data carried by EXIF and use a common translation scheme between OpenEXR attributes and EXIF tags/enum values. (For what it's worth, oiiotool didn't round trip the autofocus data)

The drawback to encouraging EXIF to be carried in OpenEXR as a blob is that a convention may emerge where important metadata is only carried in EXIF, not in the equivalent OpenEXR standard attribute, because it's "too much effort" to parse the EXIF when writing a file. It would be unfortunate if software was considered broken because it was using standard attributes for metadata instead of EXIF tags. There's a danger that every tool that reads or writes OpenEXR files has to double up reading and writing both standard attributes and EXIF tags, with its own method of resolving conflicts between the two.

Perhaps standard OpenEXR attributes should be added to support any remaining "standard" EXIF tags - the ones OpenImageIO recognizes - and an exifdata attribute used to store the rest as a blob, with the OpenEXR library itself understanding the equivalence between EXIF tags and EXR standard attributes and doing the conversion. So, header.setFromExif() would set multiple attributes, and header.getAsExif() would compile EXIF data from multiple attributes including the exifdata blob, ready for processing using an external library. That's solution is obviously a lot more work than just making an EXIF blob attribute, but I worry that a less structured solution will cause major issues in the future.

lgritz commented 1 year ago

OpenImageIO understands (and will read and write via the usual OpenEXR attribute mechanism) all the "standard" Exif fields. The parts it doesn't understand is the "Maker Note", which is a blob within the blob, and differs so much from camera to camera that it's kind of a nightmare to keep up with.

OIIO has some code that will encode/decode the main Exif TIFF-format blob that we should feel free to steal and adapt.

kmilos commented 1 year ago

Makernote blobs are a "standard" part of the Exif spec (while what's inside it is obviously not) and cannot be just ignored/discarded. IMHO a partial solution is not something I would like to support. So even if you insist on reinterpreting all of the known Exif fields to EXR attributes and back, there is still a need to at least store the Makernote as a binary blob...

lgritz commented 1 year ago

Yes, I just meant the contents of the Makernote is not documented by the Exif standard, and is very camera-specific.

So I'm saying that I would expect the Makernote to probably stay as a binary blob no matter what, even if we represent the rest of the (full-specified) Exif fields in the idiomatic way.

peterhillman commented 1 year ago

@JGoldstone do you know if the SMPTE initiative you spoke about at the town hall has considered EXIF data and how to map it into camdkit / OpenEXR standard attributes? EXIF data doesn't seem to be as precisely defined as camdkit, but a "best guesses and best practices" conversion procedure would be useful to get this to work.

@kmilos for me the concern is less about doing a Jpeg->EXR->Jpeg round trip, and more about making sure that critical metadata in the OpenEXR is consistent regardless of whether that data came from a mobile phone, a CG renderer, or a movie camera. Also, that the EXIF data in a Jpeg converted from an OpenEXR accurately contains all possible metadata regardless of what made that EXR

lgritz commented 1 year ago

@lrosenthol Do you think you can attend our next TSC meeting, scheduled for Thu Aug 24 at 1:30pm PT, to have some discussion about how we can support C2PA's needs in the OpenEXR format and library?

lrosenthol commented 1 year ago

Yes, I am planning to attend.

Leonard

On Wed, Aug 16, 2023 at 6:44 PM Larry Gritz @.***> wrote:

@lrosenthol https://github.com/lrosenthol Do you think you can attend our next TSC meeting, scheduled for Thu Aug 24 at 1:30pm PT, to have some discussion about how we can support C2PA's needs in the OpenEXR format and library?

— Reply to this email directly, view it on GitHub https://github.com/AcademySoftwareFoundation/openexr/issues/1497#issuecomment-1681362733, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGW6NOIGSX3JSJFNWJIAHLXVVENRANCNFSM6AAAAAA2ZGNAHY . You are receiving this because you were mentioned.Message ID: @.***>

kthurston commented 1 year ago

The Core library already has a mechanism to have arbitrary program-specific metadata, where a program can register pack / unpack routines for custom metadata types. However, I agree with @lgritz that if we do this, we should do this well, and as a first-class entity. I am worried about introducing dependencies on any cryptography libraries, but we should think very carefully about how best to provide this such that it works in a consistent manner by default - and probably coincides with other metadata discussion we've talked about around knowing when something should be carried along vs. updated during processing.

lrosenthol commented 1 year ago

Adding a copy of my presentation to the group today....(tried to email, but couldn't find an email alias for the group - but if anyone has, please feel free to pass along!

Intro to CAI-C2PA.pdf

cary-ilm commented 1 year ago

Thank you, sorry to miss the meeting. The general discussion email is openexr-dev@lists.aswf.io, or academysoftwarefdn#openexr on slack. Let's continue the discussion.