AcademySoftwareFoundation / openexr

The OpenEXR project provides the specification and reference implementation of the EXR file format, the professional-grade image storage format of the motion picture industry.
http://www.openexr.com/
BSD 3-Clause "New" or "Revised" License
1.63k stars 614 forks source link

Incorrect Values When Reading DWA Compressed Layer With OpenEXRCore #1682

Closed danieldresser-ie closed 6 months ago

danieldresser-ie commented 6 months ago

Tested in OpenImageIO-2.5.8.0 with OpenEXR-3.2.3 and OpenEXR-3.1.11 on Linux.

The simplest repro I've found is to create a simple test image with DWA compression and the channel names with a layer prefix, like this:

> bin/oiiotool --pattern constant 5x5 3 -d half -mulc 0.3,0.6,0.9 -chnames albedo.R,albedo.G,albedo.B --attrib compression dwaa -o sample.exr

If I then print the values in the image with openexr:core disabled in OIIO, I get the correct pixel values:

> bin/oiiotool -oiioattrib openexr:core 0 --printstats sample.exr
   5 x    5, 3 channel, float openexr
    Stats Min: 0.300049 0.600098 0.899902 (float)
    Stats Max: 0.300049 0.600098 0.899902 (float)
    Stats Avg: 0.300049 0.600098 0.899902 (float)

But if I don't disable openexr:core, I get completely wrong pixel values:

> bin/oiiotool --printstats sample.exr
   5 x    5, 3 channel, float openexr
    Stats Min: -0.008499 0.006969 0.544922 (float)
    Stats Max: -0.008499 0.006969 0.544922 (float)
    Stats Avg: -0.008499 0.006969 0.544922 (float)

This sounds similar to this bug: https://github.com/AcademySoftwareFoundation/openexr/pull/1591

But the crucial differences are: this affects both fp16 and fp32 images, it only affects images where channels are prefixed with a layer name, and this bug is not fixed in OpenEXR-3.2.3. A random observation in case it's useful - I noticed that this does not happen to a 4x4 fp16 test image - maybe DWA compression is effectively disabled if the image is too small to be worth compressing?

Note that I've only tested this in oiiotool, and in the application Gaffer, which uses OpenEXR through OpenImageIO. I've tested in oiiotool to rule out the possibility that Gaffer could be doing anything wrong, but I haven't ruled out the possibility that the problem lies in OpenImageIO rather than OpenEXR ... it feels very likely to be an issue with OpenEXRCore similiar to that previous one though.

( A slight aside, but I notice that the fix for that previous bug has been released in 3.2.3, but there haven't been any recent releases of 3.1 - are there likely to be backports of this sort of bug fix, or will we have to wait until VFXPlatform2024 for them? )

kdt3rd commented 6 months ago

This also seems to work without a prefix, (i.e. just saving as R,G,B instead of albedo.{R,G,B}, which says the bug is in the channel name classifier logic for DWA. Will investigate more

johnhaddon commented 6 months ago

Thanks for looking at this so quickly @kdt3rd. We've updated to OpenEXRCore for Gaffer 1.4, and in conjunction with some threading improvements we've made, we're seeing a 3-4x speedup in image loading. So we're definitely fans. Gaffer 1.4 is currently in beta, with the goal being to make an official release in the first half of April. Do you think this could be fixable in that timeframe, including being backported to the 3.1.x branch? We could obviously fall back to the old non-core code path as a last resort, but now we've seen the shiny new performance we're very reluctant to let it go :)

kdt3rd commented 6 months ago

Will check with the others, but don't see why it can't be backpatched, it is a one-line fix. I am not sure if there have been other fixes since (I feel like so, but maybe we already back-patched those). Glad you're seeing benefit!

johnhaddon commented 6 months ago

Oh, that simple! Thanks for the quick fix! Worst case, we can just apply that patch ourselves when we build OpenEXR.

johnhaddon commented 6 months ago

It looks like we would also need https://github.com/AcademySoftwareFoundation/openexr/pull/1591 to be back-ported to the 3.1.x branch if we're to rely on OpenEXRCore. Does OpenEXR have a policy regarding which versions are patched? Although it is 2024, we're sticking to VFXPlatform 2023 for Gaffer 1.4 because in practice that serves our users better (they're using other apps which also aren't 2024 compatible).

cary-ilm commented 6 months ago

Stated policy is here: https://github.com/AcademySoftwareFoundation/openexr/blob/main/SECURITY.md#supported-versions

These are simple fixes, we'll back-port them to 3.1, hopefully in the next few days.

johnhaddon commented 6 months ago

Thanks @cary-ilm - that's great. In the meantime I'm back-patching myself so we can confirm that it fixes the original problem in the wild.

johnhaddon commented 6 months ago

In the meantime I'm back-patching myself so we can confirm that it fixes the original problem in the wild.

Can confirm that this in conjunction with https://github.com/AcademySoftwareFoundation/openexr/pull/1591 do fix the original problem in Gaffer. Thanks once again!

cary-ilm commented 6 months ago

This fix is now released in v3.2.4 and backported to v3.1.13.

johnhaddon commented 6 months ago

Thanks @cary-ilm!