ImageCache: On-demand layer reading for multi-layered files

MrKepzie commented 9 years ago

Hi,

First off thanks for this amazing library. We use it extensively in Natron (open-source compositing software) (https://github.com/MrKepzie/Natron/tree/workshop via OpenFX: https://github.com/MrKepzie/openfx-io ) for most of the image reading/writing.

We have received some complaints by our users that the reading of multi-layered EXR files was freakishly slow compared to other similar applications. I have narrowed it down to the ImageCache of OIIO. We make use of the ImageCache::get_pixels function. The problem is that if the image is not present into the cache, the OIIO cache internaly will create a new Tile and calls ImageInput::read_image which in turns calls
ImageInput::read_scanlines(ybegin,yend,z,0,m_spec.nchannels,format,data,xstride,ystride)

So basically regardless of what we requested via the _getpixels function with chbegin and chend, the Tile will always decode all available layers in the image. Needless to say that for EXR files rendered out with many passes it takes a while.

Could this be improved to only read the channels that were requested by _ImageCache::getpixels in the first place ?

Our temporary workaround is to use the ImageInput::read_scanlines API directly, but we suffer from the lack of caching because if the user of Natron requests more than 1 layer to display, it will open as many ImageInput on our side that the the user requested, which is not efficient at all.

lgritz commented 9 years ago

The design choice was that rather than storing each channel separately in the cache, since all channels are usually used together (at least, for RGBA files), we should store them together.

But I think you do identify an important edge case -- if there are truly a large number of channels, but the typical access pattern is to only need a few of them.

I think we could change the way the cache behaves. A tile in cache could be identified by not just (x, y, z, subimage, miplevel) but also by (firstchannel, nchannels), so that different channel sets in the same file would end up in different tiles. We could constrain it so that a file with, say, 5 or fewer channels always stays on one tile even if you ask for a subset, so asking for 2 channels in a 4-channel file would still keep all four channels in the tile, whereas if you ask for channels 17-20 in a 32 channel file would end up with a tile containing just those channels, saving the cache space and maybe the I/O as well (for some formats). I think that accommodation would preserve full performance of the 99% case of asking for 3 or 4 channels from a 3 or 4 channel file.

Let me look at the code and formulate a plan. This sounds like a very reasonable request.

MrKepzie commented 9 years ago

Great! In our case most users either use RGB/A files or multi-layered EXR with many passes. Your idea seems to cover indeed all use-cases here.

Just throwing some ideas:

What about a Tile that has an "in-progress" state with some channels being read and others not read? Is this something to consider too? Although having a single Tile with all passes you increase the chance that many threads request access to the same Tile object, you keep a stronger relationship between different passes. Do you throw away the whole Tile at once or just some of the passes when the cache needs to apply its eviction policy?

lgritz commented 9 years ago

A tile is a contiguous piece of memory and is shared among threads. The synchronization and housekeeping of allowing it to be partially filled would probably overwhelm any savings we could get from this scheme. Also, the big benefit is almost certainly going to be from keeping the tile small -- holding 4 channels instead of 32 (including 28 that aren't needed yet). Tiles that hold channels not needed (or reserve space for those not needed yet) are expensive because they take up lots of room in the cache budget, causing other tiles to be evicted out of cache and making the cache overall have a lower hit rate (and thus, for tiles to be read and reread as the fall in and out of cache).

Whether it is less expensive in terms of I/O to read 4 channels out of 32, versus all of them, is by no means certain, and definitely will depend on the specific file format and details of the internals of the underlying format library (libtiff versus libIlmImf, etc). Some may do the same amount of seeks and disk I/O regardless of how many channels are needed. So reading fewer channels is probably not where most of the savings will be had.

The savings will be from reducing the number of times a given tile is evicted from cache and reread (loading the same tile multiple times will be expensive regardless of file format or library). The number of re-reads will be lower the more tiles we can fit in cache at once. We can fit more tiles in cache at once if they are smaller. Tiles will be smaller if they hold only the channels we need right away, rather than all channels in the file.

MrKepzie commented 9 years ago

I understand, I did not go into the memory layout of the tiles in the cache.

By the way, do you have an idea in which version of OIIO do you plan to insert this "fix" ? In Natron we are using OIIO 1.5.x at the moment. Thanks

Pixel-Minions commented 9 years ago

Thank you for this amazing library! I really hope this can be fixed soon. This is an issue that breaks a little the pipeline for exr files.

ramellij commented 9 years ago

Bump ! EXR is the best so please this fix is really needed. We can't use Natron without it :/

lgritz commented 9 years ago

Fix proposed in #1226

lgritz commented 9 years ago

Did any of you get a chance to try out my proposed fix?

MrKepzie commented 9 years ago

Yes we did, I will give you an answer during the day when I receive a last test containing a complex unit test. Seems to work great so far !

Alex

On 13 Oct 2015, at 00:38, Larry Gritz notifications@github.com wrote:

Did any of you get a chance to try out my proposed fix?

— Reply to this email directly or view it on GitHub.

MrKepzie commented 9 years ago

Alright so after further testing that seems to work great on some sequences and not so great on others. I'm going to profile OIIO to see what's going on internally but there seem to be a big memory leak: Reading an image sequence of 200 frames of 2k35 goes up to 7GiB memory taken by OIIO, and purging the cache does not free it.

Btw on OSX (10.11) (at least) it does not build out of the box, the build fails in doc saying:

make[2]: *** No rule to make target `../src/doc/oiiotool', needed by `src/doc/oiiotool.1'.  Stop.
make[1]: *** [src/doc/CMakeFiles/man_pages.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Commenting the following line in src/doc/CMakeList.txt fixed it:

set (cli_tools oiiotool iinfo maketx idiff igrep iconvert)

Edit:

The following assert in ImageCacheImpl::check_max_mem is triggered:

DASSERT (m_mem_used < (long long)m_max_memory_bytes*10); // sanity

We do not set any special size attribute to the ImageCache so it should be defaulting to 256MiB. But the limited is exceeded to 2GiB and 688MiB

lgritz commented 9 years ago

Does the "leak" also occur with the master (i.e., without applying #1226), or is it definitely a bug in the patch?

What kind of image file? Can you reproduce the problem with a different image format? That is, are we sure the leak is in the ImageCache and not in the format reader or even in some underlying image format library?

Can you give a recipe for reproducing? Is there an oiiotool command line, for example, or a short Python script using OIIO, that can reproduce memory growth in a similarly bad way that seems to defy the purpose of the IC?

MrKepzie commented 9 years ago

Will check tomorrow (Paris time). Sequence is 2k35 exr with 4 three-channel layers. Will check with other file formats. I will try to make a minimal example to reproduce the bug.

On 14 Oct 2015, at 22:50, Larry Gritz notifications@github.com wrote:

Does the "leak" also occur with the master (i.e., without applying #1226), or is it definitely a bug in the patch?

What kind of image file? Can you reproduce the problem with a different image format? That is, are we sure the leak is in the ImageCache and not in the format reader or even in some underlying image format library?

Can you give a recipe for reproducing? Is there an oiiotool command line, for example, or a short Python script using OIIO, that can reproduce memory growth in a similarly bad way that seems to defy the purpose of the IC?

— Reply to this email directly or view it on GitHub.

MrKepzie commented 9 years ago

So it appears on the master branch as well. Basically what happens is that the freeing loop in _check_maxmem is entered in the code path, but the destructor of ImageCacheTile is never called, hence the memory taken by the cache is never decreased. (Probably a ref count > 1 with a handle remaining somewhere).

I've made a simple self contained unit test containing also a 100-frame image sequence of 2K 32bit multi-layered exr's. In the example you can see that the cache size is set to 50MiB, yet the stat reports says there is a peak at 750MiB and when you watch system activity the program goes well over 2GiB of resident RAM (RSS).

https://www.dropbox.com/s/5llb8s2dl1iqzyn/unit-test.tar.gz?dl=0

I've made a quick build.sh script (for Unix) which needs OIIO_PATH and BOOST_ROOT defined and will build and run the program (using gcc)

Somehow with this unit test the assert in ImageCacheImpl does not trigger though. However I can show you a practical way of reproducing it using Natron (can setup everything so you just have to click and play and see the assert).

lgritz commented 9 years ago

I'm unable to reproduce any clear problem with your test. To double check the memory, I added this at the top:

#include <OpenImageIO/sysutil.h>
#include <OpenImageIO/strutil.h>

and this in the loop right after you print the IC stats:

    std::cout << "MEMORY: " << OIIO::Strutil::memformat(OIIO::Sysutil::memory_used()) << "\n";

So it iterates over the 100 files, and although the "total size of images referenced" and "Read from disk" statistics grow (eventually reaching 37.5 GB), the actual memory size reported peaks at 2.7 GB after just a few images are read and then stays completely steady. As well, the IC stats report the "Peak cache memory" at 768 MB, which is what I'd expect. Finally, the "current tiles" at the end is 2, which makes me confident that they are not failing to be reclaimed.

Why is peak cache memory 768 MB instead of the 50MB cache size you requested? Because your example files are untiled, so it will read the whole image as a tile -- 2048x1536 x 32 channels x 32 bit float == 384 MB, it can't make a quanta of storage as small as the 50MB (that's why we recommend tiled files), and actually it's double because I think it will create and read a new tile before kicking the old one out of cache and reclaiming it.

Why is RSS memory 2+GB rather than only slightly more than the 768 MB we are allocating for tile storage? Who knows? Could be memory used by shared libraries that are resident, or internal allocations inside the OpenEXR reading library libIlmImf, or any number of other things. That doesn't necessarily reflect the amount of actual RAM being used, for a number of esoteric reasons.

But in any case, I see no evidence of a leak in the ImageCache, at least not from this example.

Moving on... if I change from master to the latest version of my proposed patch, change your get_pixels call to only request the first 4 channels (of the 32 in the files), and use the new parameters from John and Hugh's suggestions to also guide get_pixels's caching to those channels:

    if (!cache->get_pixels(ustring(*it),
                           0, //subimage
                           0, //miplevel
                           spec.full_x, //x begin
                           spec.full_x + spec.full_width, //x end
                           spec.full_y, //y begin
                           spec.full_y + spec.full_height, //y end
                           0, //z begin
                           1, //z end
                           0, //chan begin
                           4,    // <--- new channel limit, WAS:  spec.nchannels, // chan end
                           TypeDesc::FLOAT, // data type
                           pixelData.data(),// output buffer
                           AutoStride, //x stride
                           AutoStride, //y stride
                           AutoStride, //z stride
                           0, 4     // <--- NEW params restricting channels in cache
                           )) {

now after 100 images it reports peak tile memory as 96 MB! So that's exactly the effect we were hoping for with this set of changes.

(Note: It still reports the "read from disk" as if it read all channels, thus overestimating in cases where you are instructing it to read only subsets of channels.)

As a last note, you may want to look at OIIO's filesystem.h and in particular the Filesystem::get_directory_entries() function, which you may find even more convenient than the 'tinydir' stuff.

MrKepzie commented 9 years ago

Alright for the 2GiB resident memory, this seems fair enough.

I can't see those 2 "New" params in the get_pixels(...) API to restrict channels in cache?

Anyway, the assert triggers everytimes when used by the OpenFX plug-in. The only thing we do different is that there can be concurrent renders (i.e: concurrent calls to ImageCache::get_pixels). Adding a printf in the destructor of ImageCacheTile indicates it is never called.

I will enhance the unit test to have a producer/consumer behaviour with multiple threads.

lgritz commented 9 years ago

Yes, I see why the assert hits -- you are, as I explained, reading files that are untiled, float, 32 channels, 2kres, thus, you have a multi-hundred megabyte tile size. But you are setting the tile cache memory very low (50 MB in the example you sent), so you have indeed constructed the "assumed to be impossible" situation of memory used being MUCH bigger than cache size.

I'm not sure what to tell you ... comment out that DASSERT? It just doesn't apply in your situation. Or set a larger cache size. Or used tiled files. Or use the new patch with cache_chbegin/chend to restrict which channels get cached.

lgritz commented 9 years ago

If you read the discussion in #1226, we decided to augment the parameters to get_texels() in the proposed patch so that the application has explicit control over which channels will form the cached tile.

lgritz commented 9 years ago

In a multithreaded application, the individual threads can end up holding a tile or two even after they have been evicted from the main cache. So lots of threads + extremely large tile size (because the file isn't tiled yet is ver large) can seem to bloat memory beyond the nominal cache size requested. I can see how that might look like a leak, though it is not.

So, the cache tile channel specifications may clear all this up, which is the point of patch #1226.

Another thing you may wish to consider is using the ImageCache "autotile" setting, which basically makes an untiled file "appear" as if its tiled (breaking it up in the cache into smaller tiles, rather than one tile for the whole image -- thiis is kinder to the cache, though possibly and the expense of extra I/O).

MrKepzie commented 9 years ago

Oh I see I was not notified by the pull request comments. In the openfx plug-in we leave the default cache size (256Mb) and the assert triggers even with the new channels control.

2048 * 1536 * sizeof(float) * 4(RGBA) = 50,331,648 bytes

Can there be a situation where all X threads are keeping at least 1 tile and reading another one, hence pushing the cache current size to 50 * 2 * X ?

I'm working on the multi-threaded unit-test to try and reproduce the situation (using the newfangled c++11 std threads)

I'm not at all concerned about memory usage, I'm just concerned that I'm not able to control exactly how much memory is used by OIIO (I mean internal libraries such as OpenEXR etc...)

In the openfx plug-in I can easily come to a situation where the ImageCache is holding 2GiB more of RAM (until the assert triggers in fact) and with NDEBUG defined can go well above that. For a memory intensive application such as a compositor, not being able to track several GiB of RAMs used somewhere is a huge risk because we have our own memory tracking and the goal is to avoid hitting the swap.

MrKepzie commented 9 years ago

Did not refresh the page before commenting. I understand we should make aware our users to use Tiled files. I will try more tests and run our plug-in on a very long sequence to see if the memory usage is consistent.

MrKepzie commented 9 years ago

Small edit: With recent changes I was also able to hit the following assert:

ASSERT (sweep != m_tilecache.end() &&
            "no way m_tilecache can be empty and use too much memory");

In ImageCacheImpl::check_max_mem

Though this was with the openfx plug-in, after calling invalidate_all(true) on the cache. The very next call to ImageCache::get_pixels triggered the assert.

lgritz commented 9 years ago

Yes, there can be a situation where all X threads are keeping references to at least one tile. Generally speaking, these are tiles that are in the cache. It was anticipated that all tiles are roughly the same size, so when one thread needs a new tile and that pushes us over the limit, it only takes a single tile being released from the main cache to get us back below the limit. So, roughly speaking, we expect tile memory to peak at cache_max_MB + 1 tile (approximately), which of course is only a tiny bit over the limit because for a texturing system a tile is usually in the neighbourhood of 64x64xRGBAxHalf = 32 KB.

The odd situation here is that we have a tile that's SO big that it might push everything else out of the cache, meaning that all the tiles being held or read at that time by other threads are now "extra" -- still being held by threads but no longer in the main cache. It's still not a leak... it's a shared pointer that will be freed when the threads no longer need those tiles.

If it's not convenient to use tiled files, you should consider setting the IC to emulate tiling for untiled files:

imagecache->attribute ("autotile", 64);

Or 128, 256, or whatever makes sense and has good performance for you. The default, without this, is that untiled files will be stored as a tile that's the resolution of the whole image, and that's what's making bad cache behavior for you.

lgritz commented 9 years ago

The second assertion you mention:

ASSERT (sweep != m_tilecache.end() &&
        "no way m_tilecache can be empty and use too much memory");

Ugh, yes, I can see how that happens in the situation I describe -- you've read lots of tiles that the individual threads may still (temporarily) be hanging onto. Now you need a tile not in cache, triggers a read, it's an untiled image so the one tile is huge (hundreds of MB), so before adding it to cache it tries to remove old tiles from cache until it's no longer below the memory limit.

BUT... it gets down to 0 tiles in the main cache, yet you are still over the memory limit (because of the huge tile you just read but haven't put into cache yet, plus the ones threads may still be using), so you hit this assertion.

You could argue that the assertion is itself the bug -- there is a "legit" situation where the main cache can be empty but you're still over the memory limit.

Or, you could try autotile, and if that clears everything up and performance is good, move on.

MrKepzie commented 9 years ago

I’m finishing up my multi-threaded example (almost done) and will experience with auto tiling as you suggest. Is there any way to toggle auto-tiling if say tile size of a file is > X pixels ?

On 16 Oct 2015, at 19:59, Larry Gritz notifications@github.com wrote:

The second assertion you mention:

ASSERT (sweep != m_tilecache.end() && "no way m_tilecache can be empty and use too much memory"); Ugh, yes, I can see how that happens in the situation I describe -- you've read lots of tiles that the individual threads may still (temporarily) be hanging onto. Now you need a tile not in cache, triggers a read, it's an untiled image so the one tile is huge (hundreds of MB), so before adding it to cache it tries to remove old tiles from cache until it's no longer below the memory limit.

BUT... it gets down to 0 tiles in the main cache, yet you are still over the memory limit (because of the huge tile you just read but haven't put into cache yet, plus the ones threads may still be using), so you hit this assertion.

You could argue that the assertion is itself the bug -- there is a "legit" situation where the main cache can be empty but you're still over the memory limit.

Or, you could try autotile, and if that clears everything up and performance is good, move on.

— Reply to this email directly or view it on GitHub https://github.com/OpenImageIO/oiio/issues/1123#issuecomment-148786311.

lgritz commented 9 years ago

Another possible solution -- it could automatically raise the memory limit to be larger than the biggest tile it's seen so far, overriding your too-small cache size limit in cases where it's clearly non-sensical given what you are reading.

That also kind of covers the deep data memory case we were separately discussing in #1236! It's really the same case.

lgritz commented 9 years ago

Currently, when autotile is on, it will autotile all untiled files.

Just use a big tile size (say, 256) and see what happens. That will still break big images up into manageable chunks. Lower res files will still get tiled, but only a few tiles, so there's very little extra work.

MrKepzie commented 9 years ago

Alright I've uploaded my unit-test on dropbox (without the initial image sequence) https://www.dropbox.com/s/575ym82ro7sahiw/oiio-unit-test.tar.gz?dl=0

There is a Producer thread (main thread) that requests X frames to be read
There are std::hardware_concurrency() parallel consumers trying to read all frames concurrently.

With all commits related to the pull request applied to oiio:

The assert in ImageCacheImpl::check_max_mem triggers (seems normal as you say)
You can see that on the Image Sequence provided in the first unit test, the process size goes over 15GiB of RAM whereas the ImageCacheImpl::m_mem_used reports only 3Gb and 222Mb of memory used (so there are somewhere more than 11Gb floating around).

I've tried setting autotile to 256: the assert does not trigger anymore but the process size goes up to 23.71Gb of RAM taken and hitting swap (see attached screenshot).

lgritz commented 9 years ago

OK, I'll take a look at this. Thanks for the test case!

MrKepzie commented 9 years ago

After a bit of profiling, it seems that most memory that is not released is allocated by IlmImf. So it might be the ImageInput objects or even ImageCacheFile not released somewhere. In my unit test if you set a breakpoint on the return 0; line you can see that there is still plenty of memory allocated even after that the OIIO cache is destroyed

lgritz commented 9 years ago

One thing I noticed:

In ProducerThread::decodeFrame(const std::string& frame, std::vector<float>& pixelData)...

std::size_t nbElementsNeeded = spec.nchannels * spec.full_height * spec.full_width * sizeof(float);
if (pixelData.size() != nbElementsNeeded) {
    pixelData.resize(nbElementsNeeded);
}

It seems to me that if it's a vector of floats, you don't need the *sizeof(float) when computing how many elements to allocate. The image buffers are supposed to be 384MB, and you're allocating 4x too much on your side, that alone will result in a waste of over 1 GB -- PER THREAD.

That's not all that's going on here. It's a very tricky usage case, and I'll explain why in a later update, I'm still poking around a bit.

lgritz commented 9 years ago

OK, I ran some tests on my end.

Your test originally died after running out of memory limit on my 12 core hyperthread machine (it allocated 24 threads as hardware_concurrency).

I fixed it to make 16 threads so I wouldn't swap or crash. Rendered using a max RSS of around 42 GB.

After fixing the bug I mentioned in the last comment (where you allocated too much memory), it ran in around 25 GB.

I added a print of the RSS size after the stats, so I could see that it gets pretty close to the max fairly early on, so I hesitate to think it's any kind of "leak" per se. Once it hits the max, it is fairly steady even while continuing to read more and more files.

You have a very odd usage pattern -- you are doing a get_pixels of the full (huge) image exactly ONCE (and by one thread only), and then never needing the file again. I don't know if this is typical of your eventual presumed usage in your app, but it's not what ImageCache was really designed for. The basic problem is that you have all these threads in flight, and each one has the buffer it's allocated on your side (that the get_pixels call copies into), plus the representation inside the ImageCache, and plus each thread may be holding onto whatever the last tile it needed was (even if it's been kicked out of the main cache). Remember, IC thinks you might ask for any pixel of any file at any time, and is not assuming that after a single get_pixels that you're done with that file forever.

So knowing that the file will not be used again once we do our get_pixels, I added a call to

_imp->cache->invalidate (ustring(frame));

right after get_pixels(). This will serve to close the file, free the underlying ImageInput, and release any tiles associated with that file, since we know it won't be needed again. That brings us from 25 to 19 GB.

Meanwhile, the IC stats do indicate that not much is held in the main cache and only nthreads ImageInputs are still in play at any time. My next suspicion turned to the IlmImf (EXR-reading library) itself.

$ oiiotool -v -frames 1-100 ../unit-test/spaceship/spaceship.#.exr -tile 64.64 -o spaceship-tif/spaceship.#.exr

That copies all your exr images to TIFF. Would you believe that reduced memory to 10 GB, so I think it's fair to say that the IlmImf library must have a lot of memory overhead, probably keeping around a read-chunk-sized buffer or two for each thread in its pool? And your buffers are BIG!

Next, I did a similar trick to turn them into tiled TIFF files (64x64 tiles), and that brings the memory use down to 6.3 GB. Tiled EXR files resulted in 6.5 GB when using the "invalidate" trick I outlined above, and 7.8 GB when not invalidating after use.

What's the theoretical best case? After fixing your bug, we have nthreads * 384 MB for your buffers -- for nthreads == 16, that means 6 GB right there. We're shockingly close to that when using ImageCache with tiled files in combination with the invalidate() trick. When we don't use invalidate() when we know we're done with files, there's another 1.5G when using exr files, that's the IlmImf overhead that probably can't be fixed and you'd almost certainly experience the same problem if you used IlmImf directly with no OIIO.

So here's my verdict:

There are three big allocators of memory: your code, ImageCache, and buffers internal to IlmImf.
Your code had a bug that was allocating 4x what it needed (and even the 1x is considerable: nthreads x 384 MB).
ImageCache is sub-optimal for the specific use case of each file being used for exactly one get_pixels call and never used again. Strategically calling cache->invalidate(filename) after the get_pixels() can cut the ImageCache allocation by about 25%.
IlmImf uses a thread pool and each thread seems to allocate and hold proportional to the biggest single read it's asked to do. The libtiff seems to have considerably less overhead (though also does not do its decompression multithreaded as OpenEXR does), using TIFF instead can cut you by another 2x.
Using tiled files, with either format, can get you very close to minimal overhead, with the vast majority of held memory being what you allocate on your side.
None of this is a "leak" in the sense that there appear to be no dangling pointers, no forgotten memory, and it is not growing without bound as you increase the number of files to be read.

MrKepzie commented 9 years ago

We applied the following step:

No caching during playback
If actively editing a frame and if multiple concurrent read would occurs (such as when zooming or panning the viewer), then we cache this frame and invalidate the previous one that has been kept so far

It seems to be much better now.

I am also in the process of making Natron be able to write multi passes files and was wondering if OIIO can open an already existing file and just append a single pass (e.g Z depth) to it.

From the OIIO doc it appears to be:

open(filename, nSubImages, specs[])

then call open() on all subimage (except the first one)

But what happens for an already existing file ?

lgritz commented 9 years ago

Just to be super clear, ImageCache is meant to solve the following problem:

Field requests for bits and pieces of image data spanning potentially thousands of images totalling up to 1 TB or more of pixel data. Often the requests are for small sections or even individual pixels, generally not entire images in one shot.
Pixel access the image set is somewhat coherent overall, but has a clear element of random access. In particular, after an access of pixels of one image, the application isn't sure if or when those pixels (or other sections of that image) will be needed again. An image will tend to get accessed many times somewhat close together in time, and if pixels are accessed there is a good chance that neighboring pixels will be accessed soon. But there is not a predictable ordering of access.
Relieve the application of keeping track of limits on memory or open file handles, and try to handle the arbitration of which tiles get freed and which files closed, in order to minimize I/O (remembering that it may well be needed again).

It's meant to be a good performance compromise for these circumstances, but definitely is not intended to be the best performance case for other access patterns, such as reading one image after another coherently and never needing it again.

lgritz commented 9 years ago

When you open a file, it deletes the old one. You can then write multiple "subimages" (what OpenEXR calls "parts"). But you can't append a single subimage to an existing file.

It's not clear to me that any of the underlying image format libraries (such as libIlmImf for OpenEXR) are capable of "appending" to an existing file. If they can, and this is an important feature, I'm certainly happy to provide a way to request that behavior in the OIIO APIs. But if the underlying libraries can't do it, there's no point.

lgritz commented 9 years ago

If you are satisfied that there is not a memory leak in ImageCache, let's close (since the GH issue itself is actually tied to a feature that has been implemented and is ready to merge). We can always open a new issue, or simply discuss on the oiio-dev mail list, for any remaining unresolved topics from earlier in this thread.

MrKepzie commented 9 years ago

Sure let's close this, we will integrate this in Natron once it gets released but for now it works as intended. Regarding the appending feature request, I'll double-check but I think that OpenEXR is capable of doing so hence making it easy to just add passes. If it can support it I will open another feature request. Anyway thanks for your support Larry

lgritz commented 9 years ago

If it's supported by OpenEXR, I'll be happy to add support for it on the OIIO side. Check into the OpenEXR issue and let me know.

MrKepzie commented 9 years ago

In fact it seems to be impossible to do (I didn’t scan all OpenEXR source code though) and all data are pre-declared in the header of the file. I was also wondering what is the correct way to write multiple passes to a file ?

A: use a packed buffer containing all channels, e.g: R/G/B/A/layer.R/layer.G/layer.B … and do a single open() and write_image B: use sub images : this will produce images that cannot be opened on applications with openexr < 2 C: ?

On 20 Oct 2015, at 19:30, Larry Gritz notifications@github.com wrote:

If it's supported by OpenEXR, I'll be happy to add support for it on the OIIO side. Check into the OpenEXR issue and let me know.

— Reply to this email directly or view it on GitHub https://github.com/OpenImageIO/oiio/issues/1123#issuecomment-149640702.

lgritz commented 9 years ago

Those are the only options I'm aware of, if the passes must be part of the same physical file.

MrKepzie commented 8 years ago

@lgritz I noticed that the following function

ImageInput::read_tile(int x, int y, int z, TypeDesc format, void *data,
                       stride_t xstride, stride_t ystride, stride_t astride)

does not handle channel ranges such as what was added to the ImageCache API. However the ImageCacheFile::read_tiles function correctly takes in parameter chbegin and chend but makes a call to the aforementioned function thus stripping off the parameters.

Was this intended ?

lgritz commented 8 years ago

"Intended", hmm. I would say that it was a historical artifact.

First there was read_tile, no channel range. Then later we wanted to support reading multiple tiles at once, so read_tiles was added. Then later we decided we wanted channel ranges. I guess I could have added a channel-range version of both read_tile and read_tiles, but I dreaded having a succession of doublings of the methods every time a new feature was needed. So I only added the channel range to read_tiles, and left read_tile so as to avoid breaking compatibility or confusing the simple case. I figured that if you needed channel ranges, you were already an advanced user, so you could just use the full read_tiles() that had all the options.

So think of it like this: there is one simple tile-reading function, read_tile, which handles the common case of one tile, all channels. Then there is the fully general read_tiles, which can handle one tile or many, all channels or just a subset.

A similar breakdown exists for scanlines: the simple read_scanline() is just for one scanline and doesn't support channel ranges, but a more general read_scanlines() supports any number of scanlines and channel ranges.

So yeah, in some sense it was by design. But incremental design, if you know what I mean -- I was trying to make a balanced decision as the API was growing. If I'd known at the very beginning that multiple scanlines/tiles and channel ranges were needed, maybe the organization would have been different. Maybe there never would have been read_scanline and read_tile at all, and only the fully general versions would have been present. But that's not how history of the software unfolded.

MrKepzie commented 8 years ago

I understand. Since the cache use the simple version of the function it makes it really slow when reading tiled EXRs with a lot of layers. I'll try to see if switching the function call from the ImageCacheFile::read_tiles can be enough to fix this

MrKepzie commented 8 years ago

Actually it seems much more complicated, it would require adding a new version of read_native_tile for necessary plug-ins (mainly OpenEXR) and tweak a little the ImageInput::read_tile function (and it's overloaded in TiFFInput)

AcademySoftwareFoundation / OpenImageIO

ImageCache: On-demand layer reading for multi-layered files #1123