darktable-org / darktable

darktable is an open source photography workflow application and raw developer
https://www.darktable.org
GNU General Public License v3.0
8.91k stars 1.1k forks source link

Fix the hash function in the grain module #16798

Closed victoryforce closed 2 weeks ago

victoryforce commented 2 weeks ago

The starting hash should be 5381, not 0.

jenshannoschwalm commented 2 weeks ago

We have this

#define DT_INITHASH 5381
typedef uint64_t dt_hash_t;
static inline dt_hash_t dt_hash(dt_hash_t hash, const void *data, const size_t size)

in darktable.h for this :-)

victoryforce commented 2 weeks ago

@jenshannoschwalm Yes, we could use our function, I'm aware of it. It is, however, designed for general data, so in this case we will have to calculate the length of the string, which, although it takes a negligible time, is illogical and not beautiful code. It might be better to add a function from this module to darktable.h as, say, dt_hash_string.

On the other hand, there is a GLib function g_str_hash, which is very similar to the one defined here in grain.c, but implements the initial algorithm with addition (so called DJBX33A), while we at darktable use a variant with bitwise XOR (DJBX33X).

Not sure which option is better to choose here.

ralfbrown commented 2 weeks ago

Frankly, it doesn't actually matter - the resulting hash value is used as a constant(!) offset on the X coordinate when generating the noise. Really all it does is ensure that two different images with identical sizes and iop settings don't get identical grain.

victoryforce commented 2 weeks ago

Really all it does is ensure that two different images with identical sizes and iop settings don't get identical grain.

@ralfbrown Yes of course! As far as I understand, this is done to avoid identical grain in the case when the user processes a series of shots to combine them into a video. In this situation, a stationary grain would look extremely unnatural.

This function hashes image file names, which in the case of a series of shots will very often differ by one character from the previous file. For the most natural looking noise, our hash function should have a good level of avalanche effect. A hash with a good avalanche level can dissipate the statistical patterns of the inputs into larger structures of the output, thus generating high levels of disorder and preventing clustering problems.

I haven't experimented, but I'm guessing that a wrong djb implementation (with a zero starting hash) might have worse avalanche quality...

BTW, perhaps we should consider a different hash algorithm to use in this module. See for example Performance of the most common non-cryptographic hash functions document.

ralfbrown commented 2 weeks ago

Changing the initalization value doesn't affect the avalanche effect - that depends on how each additional byte is mixed into the hash. For our sequence of consecutive filenames, what will affect the avalanche is which character differs from one image to the next. So it would be better to hash from the end of the filename rather than from the beginning, giving the changed characters more time to diffuse through the hash value.

victoryforce commented 2 weeks ago

Changing the initalization value doesn't affect the avalanche effect - that depends on how each additional byte is mixed into the hash.

Yes, indeed, my guess was too quick and baseless (but the fix was warranted anyway).

For our sequence of consecutive filenames, what will affect the avalanche is which character differs from one image to the next. So it would be better to hash from the end of the filename rather than from the beginning, giving the changed characters more time to diffuse through the hash value.

Well, I should have thought of that myself :). Thanks, I implemented your advice.