AcademySoftwareFoundation / OpenImageIO

Reading, writing, and processing images in a wide variety of file formats, using a format-agnostic API, aimed at VFX applications.
https://openimageio.readthedocs.org
Apache License 2.0
1.96k stars 586 forks source link

[MEMORY] Reduce file descriptors memory footprint in the ImageCache #4436

Open bfraboni opened 2 days ago

bfraboni commented 2 days ago

We would like to reduce the memory usage of file descriptors in the ImageCache as we observed a large amount of duplicated data. This is mainly due to image descriptors (ImageSpec ) being stored for each mip map (LevelInfo ) of each sub-image (SubImageInfo).

Clearing up some memory should be possible in that area ( LevelInfo / SubImageInfo ) but needs a bit a refactor, that I'm happy to start during DevDays with help from @lgritz.

More to come in the next comments about the approach to follow.

lgritz commented 2 days ago

Yes, I agree that we want to store the ImageSpec per subimage, and the individual MIP levels really only need to know their resolutions and tile sizes, I think those are the only things that meaningfully differ from one MIP level to the next. They don't all need their own ImageSpec's at all.

lgritz commented 2 days ago

Additionally, even at the subimage level, we currently store both a "spec" and a "nativespec".

The original concept was that nativespec reflects what was in the file, whereas spec describes what's in memory in the cache. The main way they can differ is (a) in the cache, only a few pixel data types are allowed (uint8, uint16, half, float) and all channels must be the same data type, whereas in the file they can be other types and can differ among the channels; (b) in the cache, the image always appears "tiled", sometimes the real tiling of the file, sometimes looking like one big tile for the whole image, sometimes looking tiled even though it's not ("autotile"). But the real metadata will not change.

(In my defense, when this was designed, we saw a lot less metadata in files, certainly not big things like ICC profiles, thumbnails, or other big things. So having a second ImageSpec didn't seem like a big deal, but now it is.)

OK, so what I'm thinking is that we could store only ONE ImageSpec per subimage, and then only the couple of fields that explain how it differs between the file and the buffer.

I'm not yet of a strong opinion whether the one spec should be the file, with separate fields for the data format and tile size of the buffer? Or if the spec should be the buffer, with additional fields saying what the data formats and tiling was like in the file? (Probably the first option, but I'm not 100% sure.)

The question is what to do about the get_imagespec() and imagespec() methods, both of which take a native flag to say which it's selecting.

IC::get_imagespec makes a copy, so it could copy the one imagespec and then modify those few extra fields if they're asking for the other kind.

IC::imagespec() is a lot trickier because it returns a pointer. This is where I got stuck before. One thing we could do is decide that henceforth, the nativespec parameter is simply ignored, and the spec pointer that is returned is always the file one (? or the memory one?) and add a separate new API call to retrieve the in-memory data type or tile size.

It may also be instructive to see how in ImageSpec itself, there is a copy_dimensions() method, which is a very lightweight way of copying only a few important fields from one spec to another. Maybe there is something analogous that we need in ImageCache.