Open DougRogers opened 3 years ago
This is an interesting topic, and it's not obvious what the right approach is, so I am definitely interested in feedback from you and anybody else reading.
So, just as background, a common theme in OIIO's APIs is that the whole image can be separated into the pixel data itself, and everything else is stored in ImageSpec. The idea is that the result of a full read cycle is an ImageSpec + a pixel buffer, and so when you write an image, you just pass an ImageSpec and a pixel buffer. The specific thing we tried to avoid was having to read item after item through separate API calls, have the user code carefully tend to those separate pieces of data, and pass them though a succession of special API calls when writing or copying one image to another.
So for that reason, I historically had hesitated to make a separate call to read the thumbnail, or to write it. And it's definitely a priority for me that the simple "read an image from one file, write it out to a different file" be really, really simple, a small number of simple API calls.
But I also see that there are times when you don't want all that expense. And thumbnails aren't the only case like this: ICC profiles can be quite big. PTex files have a big set of information giving the connectivity of the mesh. OpenEXR just merged a patch that adds (enormous!) metadata that associates a manifest of object names with integer "object ID" images. So I think this issue will keep popping up in various forms and that there is merit to addressing it intelligently.
So I've been mulling this over for a few days, and I can think of a few strategies we might take here. These are not necessarily mutually exclusive and are in no particular order:
Add a new hint for the "open with config" call, which indicates that any "big/expensive" metadata should not be read or stored in the spec. You'd tend to use this hint only for cases when you were opening the image for the sole purpose of getting the basic resolution and type info. This has the nice property of not breaking any existing code or ABIs, it's optional (for the author of the format reader) and we can implement it selectively, and it's very straightforward.
Amend ImageInput::valid_file(), which is already explicitly a minimal "I don't need the pixels, just tell me if this file exists and appears to be a valid file of this format, as efficiently as possible", to take an optional ImageSpec*
parameter, which if non-NULL will fill in just the basic resolution-like fields but none of the metadata that would go in ImageSpec::extra_attribs
. This would break ABI by either changing call signature or adding a new virtual method, but it wouldn't break API compatibility.
Exclude big/expensive things (which by definition would include thumbnails, ICC profiles, big manifests, etc.) from storing in the ImageSpec, and instead either have dedicated new ImageInput API calls to retrieve them (and in ImageOutput to store them?), and they would be lazily read only if retrieved. Or store them in a separate "ParamList for the big things" that's only read optionally. But now we're back to what I said we didn't want, which is a proliferation of calls to ferry a growing list of individual data items in and out, and ways for user code to lose metadata if they aren't careful or as the list of special things grows in the future.
Treat thumbnails as the one special case. So you would need one extra call to retrieve it from the input, one extra call to give it to the output. Cons: (a) Old code wouldn't know about it, and new code could forget to make the calls. (b) It doesn't get us any closer to figuring out what to do with the ID manifest or other large metadata. Possible Pro: You could argue that almost anything you do to an image would/should invalidate any thumbnail it originally had, so maybe it's a good thing to forget it by default and have to take a separate step to generate and attach a correct one when you're outputting an image?
Try to have a fully lazy metadata pipeline: Add a public ImageInput::getattribute() that can retrieve metadata (which would, internally check m_spec.extra_attribs first, and if not found then it would read and add the expensive thing). Small metadata would be added to m_spec.extra_attribs right away upon open, but big things would only be added when they are first requested (and if the user requests the full spec()
, anything remaining to be read is read). This is source compatible -- anybody requesting the spec()
as a whole would trigger the whole read, but if you're careful to ask for only particular things, they would only be read as (or if) you need them. I'm a little nervous about this, it does seem like the tricky logic of when to read things could have subtle bugs and edge cases.
Don't store the thumbnail in the metadata, always treat it as if it's a second subimage in the file. Pro: It wouldn't be read if you didn't read the second subimage. Cons: (a) breaks old code; (b) need some way to know whether the subimage is specifically a thumbnail of the first image versus an unrelated image.
If this is a special case of the thumbnail read on Targa being especially slow to read, perhaps we can just make that go faster, eliminating the need for this feature generally. (Note: most of the formats we deal with don't have provisions for thumbnails at all, so maybe this is just a matter of optimizing one or two format readers to the point that you no longer consider it a perf issue.)
Do nothing at all, OIIO is optimized for reading full images and it's not worth adding complexity to speed up tools that are merely scraping a disk to find resolutions of all the images but not caring about pixels. (I'm not necessarily making that argument myself, just saying that it is a potentially defensible position.)
Maybe there are other approaches, too. I would love feedback on this matter.
Another one I thought of:
There's definitely lots of options, and not all of them are mutually exclusive.
The hint to "skip big metadata" seems like the easiest path forward to me, though it is fairly limited to this particular use case. If you wanted to later get the metadata for a file you had already opened this way, you would need to close it and re-open it "for real" right? The definition of what counts as "big" would also be up to the individual file formats to honor, which could be tricky to homogenize across the format plugins, and hard for users of the library to reason about.
If we had the luxury of rethinking the API some more I would probably argue for making all metadata load on demand (while still keeping provisions for round-tripping everything properly when re-saving files). But this seems like it would be really hard to retrofit on top of existing code like you said.
The issue of how the manage thumbnails specifically seems a bit different to me and worth thinking about separately. We should have some way of addressing the thumbnail in a file as similarly as possible to other pixel data. I'm not sure that using the subimage concept works well - don't some formats support both thumbnails and subimages (exr?). This could get confusing for a user wanting to (say) loop over the subimages in a file, not realizing one of them is the thumbnail. A hint could be given in the subimage's spec (or main file spec) but you would have to know to check for it.
So maybe its worth adding a specific "thumbnail image" concept in the API? Similar to subimages, but addressed as their own thing. That would be an API change though, and break any existing code that manipulates thumbnails via the metadata API.
This is a tricky one!
If you wanted to later get the metadata for a file you had already opened this way, you would need to close it and re-open it "for real" right? The definition of what counts as "big" would also be up to the individual file formats to honor, which could be tricky to homogenize across the format plugins, and hard for users of the library to reason about.
Yeah, I am assuming that you would need to close and re-open. This features is for the common case when you only need the basics (think: iinfo *.tif
, you just want the list of files and resolutions).
One simple unambiguous dividing line is the hard coded fields of ImageSpec (the things that tell you the "shape" of the image, like resolution, nchannels, etc.) versus everything that would go in extra_attribs. In fact, it could correspond to the ImageSpec::copy_dimensions call?
If we had the luxury of rethinking the API some more I would probably argue for making all metadata load on demand (while still keeping provisions for round-tripping everything properly when re-saving files). But this seems like it would be really hard to retrofit on top of existing code like you said.
Maybe not so hard? Most existing code asks for ImageInput::spec() and then pulls out fields from there. We could make the implementation of spec() read the whole thing (un-lazify it), which would make existing code work as-is. And add a new call for retrieving an individual piece of metadata, which would load upon being called (if it wasn't already loaded). So the only programs that have to change are the ones that want to selectively pull things out.
The issue of how the manage thumbnails specifically seems a bit different to me and worth thinking about separately. We should have some way of addressing the thumbnail in a file as similarly as possible to other pixel data.
Thumbnails may indeed be a special case. I'm not averse to adding a single ImageInput call "give me thumbnail", and ImageOutput call "here is the thumbnail". It could even pass ImageBuf's around, very convenient. The only downside to this is that old programs would drop them on the floor, unless updated to make the extra calls. But do we care? Almost none of the formats we use in VFX support thumbnails (OpenEXR does not, BTW).
Grepping... Currently, Targa and PSD are our only format readers that put thumbnails in the metadata! (RAW should, but it's an unimplemented "FIXME".) And we don't have a PSD writer. So the only thing that could get lost at this moment is an app (not part of OIIO, because we would fix that) reading Targa and then writing it back out, losing the thumbnail. Why would you read a Targa and then write a Targa? You're not transcoding. Maybe you're modifying the image itself? But if you do that, the thumbnail is very likely no longer correct anyway.
So maybe thumbnails deserve their own thing. But that does nothing to solve things like the ID manifest. Though maybe that's not a real problem, either, considering they probably aren't present in any file that the reader wouldn't want that info, I don't know.
I would definitely vote for the opt-out of metadata approach, either via the "no big things" or "list of things to ignore" mode. Being at the begining of the pipeline I want everything possible to come along for the ride, but I can see how people at the tail end would just want the pixels.
Almost none of the formats we use in VFX support thumbnails (OpenEXR does not, BTW).
https://github.com/AcademySoftwareFoundation/openexr/blob/master/src/lib/OpenEXR/ImfPreviewImageAttribute.h ? Unless you meant within OIIO, in which case yes, our OpenEXR plugin doesn't currently support reading or writing thumbnails.
The only downside to this is that old programs would drop them on the floor, unless updated to make the extra calls. But do we care?
I would love support for both reading, writing, and autogenerating thumbnails from within OIIO, and I don't think older apps not having the API support would matter too much. Nobody is currently screaming about the lack of support. A separate API for the thumbnails makes sense to me (definitely not treated as a subimage).
Another option is to do what FBX does. You specify everything you wish to read upfront via flags.
if (GetFBXImporter()->IsFBX())
{
FbxIOSettings *ioSettings = GetFBXSDKManager()->GetIOSettings();
IOS_REF.SetBoolProp(IMP_FBX_CONSTRAINT, false);
IOS_REF.SetBoolProp(IMP_FBX_CONSTRAINT_COUNT, false);
IOS_REF.SetBoolProp(IMP_FBX_MATERIAL, true);
IOS_REF.SetBoolProp(IMP_FBX_TEXTURE, true);
IOS_REF.SetBoolProp(IMP_FBX_LINK, true);
IOS_REF.SetBoolProp(IMP_FBX_SHAPE, true);
IOS_REF.SetBoolProp(IMP_FBX_GOBO, false);
IOS_REF.SetBoolProp(IMP_FBX_ANIMATION, true);
IOS_REF.SetBoolProp(IMP_FBX_GLOBAL_SETTINGS, true);
IOS_REF.SetBoolProp(IMP_FBX_CURRENT_TAKE_NAME, true);
}
lStatus = GetFBXImporter()->Import(GetFBXScene());
@shootfast Right you are! I had forgotten that OpenEXR has a thumbnail option. I guess OIIO's lack of it for the past 13 years has not gotten anybody upset enough to even point out that I never got around to it.
@DougRogers Yeah, that's akin to the configuration hint about what to read. I think the correct default is to read everything, but an exclusion list makes sense, and there can be an alias for "exclude everything but the dimensional info".
So I think this is an initial plan for what seems to be resonating (both with you guys as well as in my own head):
An open() configuration option that lets an app say they don't need individual items (including nearly everything). Readers can, optionally, choose to avoid expensive steps if they consult the list. This is 100% non-breaking and can even be backported to the release branch.
For master only, make a new get/set thumbnail API call in the reader and writer. If the thumbnails are wrapped in an ImageBuf, that also means that generating them from an existing buffer is just one call to IBA::resize/resample.
For later, we can contemplate more lazy behavior. One thing I was thinking is that since all user-side access to the metadata is through the spec() call, we could add an optional parameter to spec() that says "I want it all" or "just the basics". The default would be "all", preserving old behavior. Calling with "all" would do the extra work if not already done (it still would only happen once), and the initial open would only provide the basics. Of course, any other API calls would call spec(all) internally if their operations needed to know the rest. This, too, is changing the call signatures, so would not be able to be backported. But I think it is straightforward.
Of course, any other API calls would call spec(all) internally if their operations needed to know the rest. This, too, is changing the call signatures, so would not be able to be backported. But I think it is straightforward.
Perhaps you could add an optional spec config argument that defaults to 'all' to maintain compatibility.
@DougRogers Yes, exactly what I had in mind.
I'm a little late to the party here, but I like the idea of an optional "exclusion" list, and I'm curious what you're thinking in terms of API.
My assumption after reading through this thread is that exclusions would be specified as names. If so, it may be a good idea (or even necessary) to support pattern matching of some kind, especially if you're doing something like examining a large number of images of unknown provenance (similar to the original issue description), that, for whatever reason, may use inconsistent metadata keys.
That use-case then made me wonder if the additional option would actually make more sense as an include filter (or if both include and exclude modes could be supported), although I still think pattern-matching would be important in that scenario. Am I off in the weeds here?
The most general approach could be maybe "thing1,thing2,thing3" or "all,-thing4,-thing5" so it covers both inclusion and exclusion. But is a fine-grained control really needed?
I'm still mulling this over in my head and I even started sketching a prototype last night and now I am less sure of some of this. We already mentioned thumbnails as deserving their own API calls and no longer being part of the metadata. There are very few other cases in very few formats where there might be a concrete performance benefit to excluding particular pieces of metadata. I can see it being a big bummer for all the readers to be fully plumbed with extra complexity and regex checking for every metadata item. We wouldn't want to slow down the usual case of wanting everything for the sake of slightly speeding up the very rare case of wanting a specific subset of things. I think the relatively common case (once we have excluded thumbnails) is wanting nothing beyond the dimensional information. Maybe that's all we should do for now, get the first stage done, and then see what appetite there is for more?
So after some fooling around with it last night and sleeping on it, now I'm thinking like this:
First, add the "separate thumbnail" API change.
Second, add just enough of the selection hint to say "I don't need any metadata beyond the dimensional info."
Then, wait for further feedback and see if there are any other cases that are worth additional complexity.
I would add some items that should be part of the minimal set:
If you look at the ImageSpec definition, there are hard fields for most of those things -- resolution, channels, data type, tile size, whether it's a deep file, etc. Then everything else, all the "arbitrary metadata" gets shoved in the extra_attribs list.
ImageSpec even has two copy methods, operator= which copies everything (including a deep copy of the whole extra_attribs paramlist), and copy_dimensions() which omits extra_attribs and the channel names and is very inexpensive.
So that's a pretty natural dividing line.
It starts to get muddy if you want to ask for the presence or size of a piece of expensive metadata (the metametadata?), but don't want the data itself. It's a lot less clear how often (across items and file formats) you would be able to know those things without incurring most of the expense of just getting the whole thing.
I would exclude those from the basic data and just include that type of data if requested.
There are several COMMON cases:
We've got those covered already.
We also have a concrete proposal for these less common but plausible cases:
We can dismiss the cases where you want to exclude a subset of the small cheap things, because there's no benefit to adding extra logic to skip them. You would never be able to measure the perf gain, and you might even come out behind if the cost of checking item by item was more than just copying the small items.
With thumbnails disposed of, how many cases of large/expensive metadata are there? I can think of only a few cases -- ICC profiles maybe? Name manifests in OpenEXR objectID files? Mesh connectivity in PTex files? Anything else? It's not clear how often there exists any file that has more than one of them, and if there are, when you would want one of those but not the other, and if you did, whether there would be any measurable perf benefit to being that fine-grained.
You have to also understand that for many of the file formats, the underlying reader library may have already gone to the expense of reading the whole header up front as soon as it opens, so imagining that we are saving significant expense by not asking for a few items is closing the barn door after the horse is long gone.
You could have several levels of data.
1) Simple, size and spec items 2) Med. Thumbnails 3) Image 4) large meta-data
I had forgotten about this, but in looking at imageio.h, I just realized that the ImageInput already has separate spec() and a spec_dimensions() calls, with the latter returning an ImageSpec copy that only has the important fields filled in. It seems like without breaking the interface, we could easily make it so that most of the metadata is not read until spec() is called or until some other internal functionality needs other metadata. If all the user does is open() and then get spec_dimensions(), it would never worry about most of the metadata. So the simple 2-stage lazy metadata reading is already half plumbed and could even be backported without breaking APIs.
All great points. In that light, the "K.I.S.S." changes you outlined make good sense to me.
Just because I haven't seen anyone mention it yet as a pretty big usecase: Generally when streaming in tiles (especially from many different textures), I don't believe there is any need at all for reading any metadata beyond the layout (planar/separate if relevant), format, dimension and mip/subimage count.
So much so, that in things I've set up to deal with mipmapping, actually attempt to clear as much arb metadata as possible (SHA-1, Comments, Commandline etc) to prevent any additional reading during rendering.
TIFF is reading a lot of data during open().
TIFFInput::readspec() is reading decoding xmp data
// Search for an XML packet containing XMP (IPTC, Exif, etc.)
int xmlsize = 0;
const void* xmldata = NULL;
if (TIFFGetField(m_tif, TIFFTAG_XMLPACKET, &xmlsize, &xmldata)) {
// std::cerr << "Found XML data, size " << xmlsize << "\n";
if (xmldata && xmlsize) {
std::string xml((const char*)xmldata, xmlsize);
decode_xmp(xml, m_spec);
}
}
Bumping this - I also encountered PNGs that are very slow to open due to XMP metadata. Here's a sample file attached HUD_LowerLeft.png.zip
When scanning a drive for images, I am interested in gathering all images that are larger than a specific size. During the open call, more info than just the header is read in, slowing the call significantly. Calling open on thousands or millions of images is very time-consuming.
For example, in
the TGA processing reads in the thumbnail data.
I would like to have all pixel processing, even the thumbnails, removed from the open call and just have the minimal spec information read so that this call is performant.