Open MarijnS95 opened 1 year ago
Will need to dissect if this is caused by the pointer refactor or purely the extra ALU overhead.
The pointer refactor and additional if
checks only seem to account for ±5ms, the rest is purely pow()
overhead. And that makes sense: I made the remark above that it "is more efficient to do it within the downsampling algorithm that already runs over all the pixels", but this algorithm visits and applies gamma correction to every pixel so many times with the kernel that it becomes much more inefficient.
The alternatives are either preprocessing the input (but this doesn't seem to be too fast either... :confused:) or simply documenting that the input and output will be linearized and let the user deal with this (if they need to). (We should still have this as a separate step to make our bench and test do the correct thing)
PR #48 seems to have added unorm
and snorm
texture formats in addition to the already existing Srgb(a)
format. It is yet unused, but that will be the right enum to key the request for (de)linearization off going forward!
In that sense I do think that this is more so a bug now in that we expose the Srgb(a)
vs Rgb(a)U/Snorm
capability to the caller, but don't enact on it.
Fixes #25
This crate can't assume that the input and output is linear, nor did it correct for that in the
test
example where the output fromstb_image
is clearly nonlinear (it doesn't state this in the "docs", but is visible from not linearizing JPG and PNG inputs and applying a gamma of 1/2.2 when converting HDR to LDR). While we could request users to pre- correct for this and return linear output to them, it is more efficient to do it within the downsampling algorithm that already runs over all the pixels, and (more importantly!) requiring these parameters in the input forces the caller to think about it.Unfortunately this has a massive performance regression of 150% (±40ms to ±107ms):
Will need to dissect if this is caused by the pointer refactor or purely the extra ALU overhead.