Cyan4973 / xxHash

Extremely fast non-cryptographic hash algorithm

http://www.xxhash.com/

Other

9.23k stars 781 forks source link

XXH3 Keyed Variant long secrets #361

Closed koraa closed 4 years ago

koraa commented 4 years ago

Hi,

thanks again for creating XXH3! Very impressive!

I just finally realized that the keyed variant of XXH3 requires a 192 byte secret (I always assumed it would be 192 bits). Should we make this point extra clear in the doc since 192 bit is a common key size? I'll be happy to open a PR…

192 byte also seems a bit excessive for many use cases; in terms of entropy a much shorter key length should suffice for a non cryptographic hash function…For my use case I am probably going to use chacha20 to expand a 32 byte key, or maybe SHAKE to support other key lengths, but maybe you have a better idea?

Could a specialized KDF be added to xxhash to support shorter key lengths natively?

easyaspi314 commented 4 years ago

Maybe we should make XXH3_initCustomSecret public.

Cyan4973 commented 4 years ago

The minimum size for a secret is 136 bytes, but indeed, that's still a too large a length for the described use case.

At the other extreme, it's possible to initialize using a seed, which must be exactly 8-bytes long.

Therefore, there is a gap between those two variants, which do not allow initialization with a 32-byte key. This is a capability XXH3 doesn't have yet, but indeed, it could be added.

When it comes to very large inputs, there's no problem : it's certainly possible to generate a "large enough" secret from a too small one, and could be added easily.

But for API completeness, it doesn't seem right to offer this option only for "long" inputs. One also has to consider "small inputs" (<= 240 bytes). And now it becomes a more complex story.

A simple generic solution would always generate the full internal secret, and work from there. This would work fine, but there is a "startup cost" for each hash, when generating the internal secret, and this fixed cost becomes non trivial as input length becomes small.

For example, presuming the input is just 5 bytes long, and the custom secret is 32-bytes long, the naive initialization method would still generate a full internal secret (likely 192-bytes long) before starting calculation of the hash on the 5-bytes input, adding some non-negligible latency.

Since only a portion of the secret will be used, an alternative could be to generate only the portion of the secret that will be actually used for hash calculation. For example, for a 5-byte input, it is only 16-bytes long. However, this would add branches at the beginning, potentially harming performance more than a straightforward initialization operating on a stable length.

One possibility to speed up small-input scenarios could be to create an hybrid, using only the first 8-bytes of the custom secret as a seed when calculating "small" hashes. This way, hopefully, the outcome would be closer to the _withSeed() variant. However, there are downsides : any byte of secret beyond the 8th one is ignored, hence doesn't influence the calculated hash. This considerably reduces the entropy of the custom secret. Also, on a more minor note, one can notice that the custom secret must still be at least 8-bytes long (although I guess it could be possible to create yet another variant for very short secrets), and resulting speed is probably slightly slower than the "pure" _withSeed() variant, though likely by a small margin (to be measured). But the most important problem is the first one, when the entropy of the 32-byte secret is reduced to its first 8-bytes.

Finally, another possibility is to expose a public "secret generator". This one would take as input a secret of any length, and convert it into a secret of suitable length, which can be directly consumed by the _withSecret() variant, without additional conversion (like today). The downsides are :

This is another public function to invoke before using the hash function, making it slightly more complex.
It is generating a secret, which means it also introduces memory management topics. Who's in charge of allocation ? Don't forget to free() after usage.
The generator is only meant to be used for "small" custom secrets, therefore it must "fail" for large custom secrets, which should be consumed directly, without preparation step. Not a big deal, as it can be well documented, just an API issue to keep in mind.

Maybe the "secret generator" could be an optional optimization step, which would help speed (specifically for batch-processed "small" inputs) by "sharing" the initialization stage, but not be required.

In the end, it may look like a small feature to add, but there are several questions to nail right to preserve API completeness and make it happen.

koraa commented 4 years ago

Thanks for your quick, comprehensive responses! Don't have much time today so I'll take the time to write a proper reply next week!

First of all, I agree this is not an easy feature; generating a long secret from a long one is easy but making it perform well & while not diminishing security properties certainly is not…

At the other extreme, it's possible to initialize using a seed, which must be exactly 8-bytes long.

What are the security properties of the seeded variant vs the keyed variant? The key derivation step seems relatively basic…alternately adding and subtracting the key from the long secret?

My particular use case btw is evaluating XXH3 as a hash function for rust hash tables; this use case has some specific properties: I can generate the sequence once and use it many times, needs at least some collision attack resistance, it will always use streaming mode (so I suspect a lot of the short input optimizations may be lost?)

easyaspi314 commented 4 years ago

Streaming does have measurable overhead for short hashes.

As for a KDF, how about doing something like XXH3_initCustomSecret, but like this:

#define XXH_PRIVATE_API
#include "xxhash.h"
#include <stdlib.h>

static xxh_u64 XXH3_circularReadLE64(const xxh_u8 *buf, size_t offset, size_t size)
{
    offset %= size;
    if (XXH_likely(offset + 8 < size)) {
        return XXH_readLE64(&buf[offset]);
    } else {
        /* slow but does the job. */
        xxh_u64 ret = 0;
        size_t i;
        for (i = 0; i < sizeof(xxh_u64); i++) {
            ret |= buf[(offset + i) % size] << (8 * i);
        }
        return ret;
    }
}

void *XXH3_generateCustomSecret(const xxh_u8* seed, size_t seedLen)
{
    xxh_u8 *customSecret = XXH_alignedMalloc(XXH_SECRET_DEFAULT_SIZE, 16);

    size_t const nbRounds = XXH_SECRET_DEFAULT_SIZE / 16;
    size_t i,j;
    if (customSecret == NULL) return NULL;
    XXH_memcpy(customSecret, kSecret, XXH_SECRET_DEFAULT_SIZE);

    if (seed == NULL || seedLen == 0) {
        return customSecret;
    }

    for (j = 0; j <= seedLen / XXH_SECRET_DEFAULT_SIZE; j++) {
        for (i=0; i < nbRounds; i++) {
            xxh_u64 lo = XXH_readLE64(customSecret + 16*i)     + XXH3_circularReadLE64(seed + (XXH_SECRET_DEFAULT_SIZE*j), 16*i + (seed[(2*i)%8]), seedLen % XXH_SECRET_DEFAULT_SIZE);
            xxh_u64 hi = XXH_readLE64(customSecret + 16*i + 8) - XXH3_circularReadLE64(seed + (XXH_SECRET_DEFAULT_SIZE*j), 16*i+8 + (seed[(2*i+1)%8]), seedLen % XXH_SECRET_DEFAULT_SIZE);
            XXH_writeLE64(customSecret + 16*i,     lo);
            XXH_writeLE64(customSecret + 16*i + 8, hi);
        }
    }
    return customSecret;
}

void XXH3_freeCustomSecret(void *secret)
{
    XXH_alignedFree(secret);
}

//.....
#include <stdio.h>

static void print_buf(const xxh_u8 *ptr, size_t len)
{
    for (size_t i = 0; i < len / 16; i++) {
        for (size_t j = 0; j < 16; j++) {
            printf("%02x, ", *ptr++);
        }
        printf("\n");
    }
    if (len % 16) {
        for (size_t j = 0; j < len % 16; j++) {
            printf("%02x, ", *ptr++);
        }
        printf("\n");
    }
}
#ifndef TMP_BUF_SIZE
#define TMP_BUF_SIZE 32
#endif
int main(void)
{
     xxh_u8 tmp[TMP_BUF_SIZE] = { 0 };
     FILE *f = fopen("/dev/urandom", "rb");
     fread(tmp, sizeof(tmp), 1, f);
     fclose(f);
     printf("in:\n");
     print_buf(tmp, sizeof(tmp));
     printf("out:\n");
     xxh_u8 *ptr = XXH3_generateCustomSecret(tmp, sizeof(tmp));
     print_buf(ptr, XXH_SECRET_DEFAULT_SIZE);
     XXH3_freeCustomSecret(ptr);
}

Just a rough 15 minute idea, but we could do something better.

Cyan4973 commented 4 years ago

My particular use case btw is evaluating XXH3 as a hash function for rust hash tables

I presume it means that a substantial portion of inputs will be "small" (as in < 240 bytes).

needs at least some collision attack resistance,

Ok, so collision resistance is an important property.

What are the security properties of the seeded variant vs the keyed variant? The key derivation step seems relatively basic

Indeed, the seeded variant is designed to be fast, first and foremost. So, while it alters the internal secret in a predictable way, it does so using a simple sequence.

If you're looking for an obfuscation strategy which makes it more difficult to rewind to the original "short secret", we could consider a strong strategy, with little regards for its speed. In which case, there is no shortage of technique that could be used to digest the "short secret" and transform it into a suitably longer one

The only issue is that such a preparation stage now has to be offline, otherwise it would cost too much to perform for each hash.

As a consequence, it also calls into question if there is still a need to accept "short secrets" in the regular interface (like XXH3_64bits_withSecret()), since such a scenario would require a much faster initialization delay, and therefore cannot support the same initialization strength. If both entry points are allowed, a hash based on a small secret would produce a different result than a hash based on the long secret generated from said short secret by the generator (still follow me?:p). I presume this could be well documented, and the distinction between the 2 scenarios could be made quite clear, but a risk of confusion will likely remain.

Alternatively, if the regular interface doesn't accept short secrets, and if the generator only accepts short secrets (the ones that the regular interface refuses, hence with len < XXH3_SECRET_SIZE_MIN bytes), then results can't be directly compared, which reduces risks of confusion.

That being said, there are additional advantages to a clearly separated generator. Such a generator would not be restricted by the kind of input it can accept, it would always generate a fixed 192 bytes secret from any input, including longer ones. This may seem overkill, but I can see a use case for that. Presuming, for example, that someone wants to use a text entry as its custom secret, or any other low entropy source available around, it should be pretty clear that its distribution of 1 and 0 bits is not random enough, and it will impact the quality of the produced hash. In contrast, the secret generator would ensure that the generated secret is always random enough, whatever the input (even silly ones like a bunch of zeroes), essentially guaranteeing the quality of generated hash. The larger purpose therefore is to generate a "high entropy" secret from any source, including short ones.

koraa commented 4 years ago

I've finally been able to take a closer look at the keyed version of XXH3; I am not done yet done forming an opinion, but here are some observations.

struct XXH3_state_s {
   XXH_ALIGN_MEMBER(64, XXH64_hash_t acc[8]);
  /* used to store a custom secret generated from the seed. Makes state larger.
   * Design might change */
   XXH_ALIGN_MEMBER(64, unsigned char customSecret[XXH3_SECRET_DEFAULT_SIZE]);
   XXH_ALIGN_MEMBER(64, unsigned char buffer[XXH3_INTERNALBUFFER_SIZE]);
   XXH32_hash_t bufferedSize;
   XXH32_hash_t nbStripesPerBlock;
   XXH32_hash_t nbStripesSoFar;
   XXH32_hash_t secretLimit;
   XXH32_hash_t reserved32;
   XXH32_hash_t reserved32_2;
   XXH64_hash_t totalLen;
   XXH64_hash_t seed;
   XXH64_hash_t reserved64;
   /* note: there is some padding after due to alignment on 64 bytes */
   const unsigned char* secret;
};   /* typedef'd to XXH3_state_t */

As far as I understood this, customSecret contains the secret for the seeded variant, while secret points to the secret used (to the default one for the default variant and to the external one for the external variant). This seems kind of odd to me; it makes XXH3_state_s non-movable for the seeded variant (but both other variants) and means the secret could be modified while XXH3_state_s exists…

Non immovability is a bit of an issue for C++ and rust integration. I think at least that could be solved by setting "secret=NULL" when the secret is in customSecret.

Streaming does have measurable overhead for short hashes.

OK, but there is a fixed setup phase for both 64 and 128 bit variants with seeded+streaming? You always need to copy the default secret and derive the seed, even for very short hashes.

Then I started playing around with different secrets and look at this:

// c++ src/test.cc --std=c++17 -Wall -Wextra -Wpedantic -o src/test -Ivendor/xxhash && src/test
#define XXH_STATIC_LINKING_ONLY 1
#define XXH_INLINE_ALL 1
#include <cassert>
#include <cstdint>
#include <utility>
#include <string>
#include <vector>
#include <iomanip>
#include <iostream>
#include <fstream>
#include <filesystem>
#include <xxhash.h>

namespace fs = std::filesystem;

void pt(uint8_t val) {
  std::cout << std::hex << std::setfill('0') << std::setw(2) << ((int)val);
}

void pt(uint32_t val) {
  std::cout << std::hex << std::setfill('0') << std::setw(8) << val;
}

void pt(uint64_t val) {
  std::cout << std::hex << std::setfill('0') << std::setw(16) << val;
}

void pt(const XXH128_hash_t &val) {
  std::cout
    << std::hex << std::setfill('0')
    << std::setw(16) << val.high64
    << std::setw(16) << val.low64;
}

int main() {

  uint8_t data[] = { 0x00, 0x00 };

  uint8_t key[192];
  std::fill(std::begin(key), std::end(key), 0);

  for (size_t idx=0; idx < std::size(key); idx++) {
    for (size_t bit=0; bit < 8; bit++) {
      uint8_t &v = key[std::size(key) - idx - 1];
      v = (v << 1) | 1;
      pt((uint8_t)idx);
      std::cout << ":" << bit << " ";
      pt(XXH3_128bits_withSecret(std::data(data), std::size(data), std::data(key), std::size(key)));
      std::cout << "\n";
    }
  }

  return 0;
}

Not going to post the output, but it looks like, for short inputs XXH3_64/128 are ignoring the anything but the first 64/128 bits of the key; for a key this is not good, the expectation for a key IMO is that all bits of the key are integrated in the hash; I recommend also taking a look at some of the hashes; e.g. xxh3_64 yields 6ddd9d6c6cdec134 for key=zero192, data=zero2. It might just be me, but this looks anything but random. As you said, XXH3 relies on the secret entropy a lot…

This is perfectly fine, as long as the key (maybe call it something different, how about "entropy") was generated from a decent source of randomness…be it the system CSPRNG or a PRNG…

Have you considered using some sort of online key schedule; I could imagine generating the long seed on the fly using a vectorized xorshiro PRNG could be quite fast and yield a lot better results? It could also be faster for the seeded/streaming variant (no setup) and might even be fastor for other variants because no cache access would be needed.

koraa commented 4 years ago

I presume it means that a substantial portion of inputs will be "small" (as in < 240 bytes).

Probably, yes.

easyaspi314 commented 4 years ago

Not going to post the output, but it looks like, for short inputs XXH3_64/128 are ignoring the anything but the first 64/128 bits of the key

If every byte of the secret affected XXH3's output, performance will plummet.

We actually choose different parts of the secret on short keys (and mix two samples) to make up for this.

Before, short keys used exclusively the first few bytes.

Also, your "modern C++" is not modern enough! Unacceptable.

```cpp // c++ src/test.cc --std=c++17 -Wall -Wextra -// c++ src/test.cc --std=c++17 -Wall -Wextra -Wpedantic -o src/test -Ivendor/xxhash && src/test #define XXH_STATIC_LINKING_ONLY 1 #define XXH_INLINE_ALL 1 #include #include #include #include #include #include #include #include #include #include #include #include namespace fs = std::filesystem; template void pt(T &&val) { std::cout << std::hex << std::setfill('0') << std::setw(sizeof(T) * 2); if constexpr (std::is_integral_v && sizeof(T) < sizeof(std::uint32_t)) { std::cout << static_cast(val); } else { std::cout << val; } } template <> void pt(XXH128_hash_t &&val) { std::cout << std::hex << std::setfill('0') << std::setw(16) << val.high64 << std::setw(16) << val.low64; } auto main() -> int { std::array data{ 0x00, 0x00 }; std::array key; std::fill(std::begin(key), std::end(key), 0); for (std::size_t idx{0}; idx < std::size(key); idx++) { for (std::size_t bit{0}; bit < 8; bit++) { std::uint8_t &v = key[std::size(key) - idx - 1]; v = (v << 1) | 1; pt(static_cast(idx)); std::cout << ":" << bit << " "; pt(XXH3_128bits_withSecret(std::data(data), std::size(data), std::data(key), std::size(key))); std::cout << std::endl; } } } ```

koraa commented 4 years ago

If every byte of the secret affected XXH3's output, performance will plummet.

We actually choose different parts of the secret on short keys (and mix two samples) to make up for this.

Before, short keys used exclusively the first few bytes.

Totally makes sense, I am just pointing out that this is one of the expectations that comes with having a keyed hash function; it makes total sense from a performance perspective to do it this way; I do think this introduces the need to be clearer in the API design & documentation though…

Also, your "modern C++" is not modern enough! Unacceptable.

Can't argue with that. sadkitten.gif

EDIT I think XXH3_copyState should also update the internal pointer; otherwise it can lead to a use after free…

easyaspi314 commented 4 years ago

I do think this introduces the need to be clearer in the API design & documentation though…

Hmm, maybe we should just replace the comments with /* chopped liver */. Clearly, nobody reads them, and it will save space.

Yann, should I open a PR? :smirk:

https://github.com/Cyan4973/xxHash/blob/d8bed9a54d6bcc6f7165b61647d4b066e1ae17eb/xxh3.h#L671-L686

easyaspi314 commented 4 years ago

EDIT I think XXH3_copyState should also update the internal pointer; otherwise it can lead to a use after free…

I mean I thought we made it pretty clear... https://github.com/Cyan4973/xxHash/blob/d8bed9a54d6bcc6f7165b61647d4b066e1ae17eb/xxhash.h#L600-L605

koraa commented 4 years ago

https://github.com/Cyan4973/xxHash/blob/d8bed9a54d6bcc6f7165b61647d4b066e1ae17eb/xxhash.h#L600-L605

This affects the seeded variant; not the variant with a secret.

// pseudocode
a = XXH3_createState();
XXH3_64bits_reset_withSeed(a, 0xabcd);
b = XXH3_createState();
XXH3_copyState(b, a);
XXH3_freeState(a);
XXH3_update(b, ...);

Unless I have missed something, this will cause a use after free, because secret in b points at customSecret in a (violating the requirement you quoted above).

easyaspi314 commented 4 years ago

We don't malloc or free the secret pointer, and you can see that by how our pointer is const unsigned char *.

void free(void *ptr);

https://github.com/Cyan4973/xxHash/blob/d8bed9a54d6bcc6f7165b61647d4b066e1ae17eb/xxhash.h#L572

This simplifies things and avoids unnecessary comments:

XXH_errorcode XXH3_64bits_reset(XXH3_state_t *state)
{
    ...
    state->secret = kSecret;
    ...
}
XXH_errorcode XXH3_64bits_reset_withSeed(XXH3_state_t *state, XXH64_hash_t seed)
{
    ...
    // XXH3_state_t::customSecret is a fixed size array
    XXH3_initCustomSecret(state->customSecret, seed);
    state->secret = state->customSecret;
    ...
}
XXH_errorcode XXH3_64bits_reset_withSecret(XXH3_state_t *state, const void *customSecret, size_t secretSize)
{
    ...
    state->secret = customSecret;
    ...
}

It also allows the state to be 100% stack based and used the exact same way as XXH32_state_t and XXH64_state_t.

Cyan4973 commented 4 years ago

for short inputs XXH3_64/128 are ignoring the anything but the first 64/128 bits of the key

That's correct. For short inputs, only portions of the secret are used.

the expectation for a key IMO is that all bits of the key are integrated in the hash

This situation can be solved with a secret generator. This one can make sure that any bit of the custom seed will contribute to all results, for any input of any length, by drastically altering the generated secret. Note that, during the hash calculation, only a portion of the secret will be used, it's just that all portions of the secret are sensible to any bit in the custom seed.

XXH3 relies on the secret entropy a lot

It totally relies on it. The secret must have a high entropy. This situation can also be solved via the secret generator, as any custom seed, even ridiculously trivial (like a bunch of zeroes), will be turned into a high entropy secret.

it makes XXH3_state_s non-movable for the seeded variant

That's a good point. This design could be changed, using your suggested convention that secret == NULL means "use customSecret instead, but it will add a branch at the beginning of each invocation, in order to select the right pointer. In XXH3, we try to avoid branches, as they impact performance, especially on short inputs. That being said, the streaming variant already adds quite some overhead, so an additional pointer comparison branch may not be that sensible.

I could imagine generating the long seed on the fly using a vectorized xorshiro PRNG could be quite fast and yield a lot better results? It could also be faster for the seeded/streaming variant (no setup) and might even be fastor for other variants because no cache access would be needed.

That's more or less the intended design for the secret generator, although the details can vary, and what I had in mind is not specifically the "xorshiro PRNG". Also, my expectation is that the secret generator, while not being especially "slow", will not be fast enough to consider its delay negligible, especially in combination with small inputs. So a "faster" performance wasn't on the list.

It could be that the implementation details matter, and the "generator" I had in mind could be meaningfully improved. Maybe that's a discussion we could have once a first design is accessible and can be commented.

I think XXH3_copyState should also update the internal pointer; otherwise it can lead to a use after free…

Let's have a look, it could be a bug.

easyaspi314 commented 4 years ago

OH, now I see the use after free issue. Yeah, major bug. xxh3 use after free

Maybe we should add a set of flags in one of the reserved fields.

That would also allow us to prevent mixing XXH3_64B and XXH3_128B, but that would require a behavior change (although we could rule the existing behavior as UB).

koraa commented 4 years ago

That's more or less the intended design for the secret generator, although the details can vary, and what I had in mind is not specifically the "xorshiro PRNG". Also, my expectation is that the secret generator, while not being especially "slow", will not be fast enough to consider its delay negligible, especially in combination with small inputs. So a "faster" performance wasn't on the list.

Great! Thank you, that's awesome!

Cyan4973 commented 4 years ago

366 tries to solve the `XXH3_copyState()` bug, when copying a state initialized with a `seed`.

It also makes XXH3_state_t movable, meaning that a trivial memcpy() results in a valid state preserving initial properties.

easyaspi314 commented 4 years ago

As for a secret generator, why not do a variant of wyrand?

We already have the mult+fold routine, so it wouldn't need much effort to implement.

/* A variant of wyrand */
xxh_u64 counter = 0;
size_t i;
for (i = 0; i < XXH_SECRET_SIZE_MIN / 8; i++)
{
    xxh_u64 seed_tmp = XXH_circularReadLE64(seedBuf, i * 8, seedBufSize);
    counter += XXH_readLE64(kSecret + i * 8);

    XXH_writeLE64(
        customSecret + 8 * i,
        XXH3_mult64to128_fold64(counter ^ seed_val, counter)
    );
}

Cyan4973 commented 4 years ago

Here is a proposed prototype for the requested Key Derivation Function :

/*
 * XXH3_generateSecret():
 * Take as input a custom seed of any length and any content,
 * generate from it a high-quality secret of length XXH3_SECRET_DEFAULT_SIZE
 * into already allocated buffer secretBuffer.
 * The generated secret can then be used with any `*_withSecret()` variant.
 * customSeed can be anything, even a "low entropy" source, such as a bunch of zeroes.
 * It can also have any size (even < XXH3_SECRET_SIZE_MIN).
 */
#define XXH3_SECRET_DEFAULT_SIZE 192
XXH3_PUBLIC_API void XXH_generateSecret(void* secretBuffer, const void* customSeed, size_t customSeedSize);

In this proposal, the generated secret always has a fixed size of XXH3_SECRET_DEFAULT_SIZE bytes. As a consequence, the function is guaranteed to always work, provided that secretBuffer is already allocated.

A more advanced proposal would allow generation of secrets of any size, making secretSize a parameter of the function. In which case though, secretSize must respect a minimum of XXH3_SECRET_SIZE_MIN, meaning that the function can fail if this condition is not respected, and must return an error code (which must then be checked by the user). The implementation side is also quite a bit more complex.

This feels like a more advanced capability, that could be implemented in a future version.

A few words on the expected properties of this function :

Any bit swap in the custom seed will change the entire secret produced, therefore all bits of the custom seed contribute to all hashes, even those which only use a portion of the secret.
The secret generation is a bit more involved than the existing seed method, and that's why it's designed to be invoked offline, using its own dedicated function. That being said, there is still considerable leverage in selecting an appropriate speed / randomization trade-off. I guess this discussion will be more clearly illustrated once there is an actual implementation to discuss about.

easyaspi314 commented 4 years ago

Well what if we generate an XXH3/XXH64 hash from the custom secret, and use that as the seed for a PRNG function?

Cyan4973 commented 4 years ago

Well what if we generate an XXH3/XXH64 hash from the custom secret, and use that as the seed for a PRNG function?

Indeed, there will be a little bit of that. An equivalent method will be used to ensure that any bit of the custom seed will contribute to the entire secret.

That being said, if it was all there is, then one could do it right now, using existing functions.

However, one of the issues mentioned earlier in this thread is that the seed derivation function is very light, meaning it's still possible to infer some knowledge from the secret even without knowing the seed. For example, let's call key1 == SeededSecret[0-7], and key3 == SeededSecret[16-23]. Then, we can infer the euclidian distance between key1 and key3, because SeededSecret[0-7] = defaultSecret[0-7] + seed and SeededSecret[16-23] = defaultSecret[16-23] + seed, therefore key3 - key1 == defaultSecret[16-23] - defaultSecret[0-7], aka a constant. It's unclear how this knowledge could be leveraged, but there is no denying that some level of information can still get inferred knowing that the algorithm used is the _withSeed() variant.

Now, it's not necessarily horrible, because xxHash is a non-cryptographic hash, so it's not expected to offer a large level of protection anyway. Nonetheless, I'm guessing that users of the _withSecret() variant are looking for a level of obfuscation which is superior to the _withSeed() variant, therefore, employing the exact same "light" derivation wouldn't do it.

So, I will propose a derivation function which is a bit more involved that the seed derivation one. How much more though will be the debatable question.

Cyan4973 commented 4 years ago

In branch kdf, there is a first version of the Key Derivation Function. This is the simplest version I could think off, I expect it to be relatively fast, while providing the qualities discussed in this thread :

Works with any customSeed, of any length, including very short ones
Produces high entropy secrets from any content, even pathological ones
All bits from the seed impact all bits of the produced secret, and therefore impact all produced hash values for any input length.

Here is how it works :

It produces a 128-bit scrambler pattern from the custom seed, using XXH128(). This ensures that all bits from the customSeed contribute.
For each 128-bit segment, combine : (defaultSecret ^ customSeed) + scrambler

Since the secret is always 192 bytes, the part of customSeed which is directly combined is only the first 192 bytes. The other bytes still contribute to the final secret through the scrambler. When the customSeed is smaller than 192 bytes, it's applied repetitively, by segments of 16 bytes (so remaining bytes are just padded with 0).

Compared to "just applying the scrambler", this is a bit better, as it avoids generating a secret which is merely defaultSecret + constant . That being said, any time customSeedSize <= 16 bytes, this is effectively what happens (well, ^ then + is slightly different, but close enough). Also the variability between segments still depends on the raw content of customSeed, so if its content is pathological (like a lot of space characters), the variability becomes closer to defaultSecret + constant.

If this situation is deemed not good enough, it's possible to consider a stronger variant, where the scrambler is updated at every segment. For example, by doing scramblerN2 = XXH128(scramblerN1). The scrambler sequence is effectively limited to 128-bit of entropy, but that should be good enough for a 64 or 128 bit hash. An alternative is to hash the whole customSeed with just a different seed for each segment, but if customSeed is very long, then it would end up hashing a long input 12 times.

easyaspi314 commented 4 years ago

Why are you using XXH128_hash_t like that? It's kinda weird.

Cyan4973 commented 4 years ago

any better solution ?

Cyan4973 commented 4 years ago

I went ahead, and updated the generator to a second version, where the 128-bit scrambler is updated every 16 bytes. This removes the possibility that separated segments of the secret might be related by a certain delta when the custom seed is pathological (too small, or full of zero). Incidentally, it also simplifies the code, because there is no need anymore to combine the scrambler with kDefaultSecret nor rawCustomSeedContent. All randomness qualities are guaranteed by XXH128() alone.

On the downside, one can think of :

It's a bit longer to generate, because XXH128() is invoked 12 times instead of 1.
- that being said, all invocations after the first one use as input the previous 16-bytes segment, so the calculation cost remains controlled, it does not inflate with larger customSeed.
- it's also partly compensated by the removal of the combination stage.
The entropy of customSeed is effectively reduced to 128-bit, whatever the original content.
- However, that sounds reasonable for a 64 or 128 bit non cryptographic hash
If, knowing that a secret has been produced with generateSecret(), an unauthorized actor can extract or guess the first 16-bytes of the secret, it can determine the rest of the secret by iterating XXH128() the same way.
- That would have been a problem in earlier versions, where the bijectivity of XXH128() for 16-bytes inputs could have been exploited to rebuild the first 16-bytes of the secret (assuming an actor has access to both the input and the full return value of the hash function), but that's no longer the case : now, such an actor can only extract a combination of secret[32-47] ^ secret[48-55], which is not trivially exploitable.
- Also, as XXH128() is bijective for 128-bit inputs, there is no space reduction after each round. So one cannot hope that the last segment of the generated secret belongs to a reduced list of candidates.

I believe that's reasonable in term of quality. The only missing capability is to generate a secret of custom size (right now, the generator only generate secrets of size XXH_SECRET_DEFAULT_SIZE, aka 192 bytes), but as mentioned previously, this could be done in a later advanced variant. The new code is actually even easier to update in order to generate custom secret sizes.

As always, this is opened to discussion.

koraa commented 4 years ago

I think reducing the entropy of the secret to 128 bits is perfectly acceptable for a 128 bits key…

Using scrambler_n+1 = XXH128(scrambler_n) basically implements a PRNG; why not use an existing PRNG for the expansion stage? xorshiro is quite fast to implement…If this is not an option, using a 64 bit counter (scrambler_n = XXH128(data=scrambler_0, seed=n)) could be easier to parallelize in the feature and might even enable the generator to utilize instruction parallelism now…

EDIT 1 Isn't the byte representation of XXH128 endianess dependent? How about mandating LE conversion for the KDF (BE/canonical would require conversion on LE systems).

EDIT 2 May I suggest adding API documentation?

Derive a secret for use with keyed XXH3 versions. Use this if you need a higher level of security than the one provided by 64bit seed.

The generated secret ALWAYS is XXH_SECRET_DEFAULT_SIZE bytes long.

The functions `XXH3_128bits_withSecret`, `XXH3_64bits_withSecret`, `XXH3_128bits_reset_withSecret` and `XXH3_64bits_reset_withSecret`
each accept a secret parameter. This parameter is very long for implementation reasons so for most applications you should just use
this function to generate the secret.

Supplying NULL as the customSeed copys the default secret into the secret buffer.
Supplying NULL as customSeed and any other value than zero for the customSeedSize is undefined behavior.

Note it feels a bit weird using the term "secret" here; maybe it would be a good idea to abandon the term "secret" for use in a non crypto hash or to clarify the security guarantees/security model the keyed variants actually use. Like what kind of attack at what effort would be acceptable for XXH3_withSecret?

EDIT 3 Have you considered making an API to access the internal secret possible? Could be void reset_withSecretCopy(const void *secret) or even an API to void* reset_unsafeSecretInit() to allow users to manually initialize the secret (for those who like to live dangerously).

Cyan4973 commented 4 years ago

why not use an existing PRNG for the expansion stage? xorshiro is quite fast to implement

This would add yet another code, with associated complexity & maintenance, for the purpose of a single use case. If I was looking for a 64-bit prng, xorshiro being a 64-bit prng would be a reasonable candidate, but even then, XXH3_64bits() would also be pretty good, as it is based on rrmxmx which has been measured by practrand as providing higher quality that xorshiro (which is already pretty good). Anyway, as mentioned, I'm rather looking for a 128-bit prng, because the secret can be used for both XXH3_64bits() and XXH3_128bits(), so 128-bit looks more appropriate for a 128-bit hash generator.

using a 64 bit counter (scrambler_n = XXH128(data=scrambler_0, seed=n)) could be easier to parallelize

Indeed, that's a very good point !

Isn't the byte representation of XXH128 endianess dependent?

Yes it is ! I should have used the canonical representation of XXH128_hash_t, which is endian independent ! To be fixed !

May I suggest adding API documentation?

Yes, these are good suggestions. I'll update the documentation accordingly.

Note it feels a bit weird using the term "secret" here; maybe it would be a good idea to abandon the term "secret" for use in a non crypto hash or to clarify the security guarantees/security model the keyed variants actually use. Like what kind of attack at what effort would be acceptable for XXH3_withSecret?

I can try, though it's not a simple objective. A "proper name" could be "entropyPool", but that's a mouthful, and I'm concerned it would feel too cryptic, misunderstood and perhaps even frighten users.

Regarding security, my next goal, after the generator, is to try to defeat XXH3, in order to identify obvious vulnerability scenarios, allowing collision generations using much less than brute-force attempts. But even that does not offer "cryptographic" qualities, which is a high bar that I don't even know how to reach. And in these days and age, there is no shortage of extremists eager to troll that anything less than absolute perfection is worth zero, making it very difficult to introduce, define and defend intermediate levels. So I'm not sure what the outcome of this effort will be.

Have you considered making an API to access the internal secret possible? Could be void reset_withSecretCopy(const void *secret) or even an API to void* reset_unsafeSecretInit() to allow users to manually initialize the secret (for those who like to live dangerously).

I may not properly understand the question. Isn't that equivalent to XXH3_64bits_reset_withSecret() ?

Thanks for the insightful comments @koraa !

koraa commented 4 years ago

I may not properly understand the question. Isn't that equivalent to XXH3_64bits_reset_withSecret() ?

Sorry, should have been clearer; I meant the internal secret so users can avoid allocating an external buffer if they use a XXH_SECRET_DEFAULT_SIZE sized secret.

Thank you for taking the time to work on this!

Cyan4973 commented 4 years ago

Sorry, should have been clearer; I meant the internal secret so users can avoid allocating an external buffer if they use a XXH_SECRET_DEFAULT_SIZE sized secret.

The default (or internal) secret is meant to be read-only, so it's not mutable, and can't be swapped. This is an important property of the design, with direct consequences on generated code, which tends to be faster on small keys as the relevant portions of the secret are directly embedded into the instructions stream. It also means that it doesn't actually use "memory", in the sense of "allocated RAM". What it does use is code space, but that part could be in ROM, or within protected memory, so it is not directly comparable.

And even that cost can be avoided if needed. The idea is to rely on compiler's Dead Code Elimination. Presuming the library is directly included into target project (as opposed to linking a dynamic library), if functions invoked are limited to XXH3_64bits_withSecret() or XXH3_128bits_withSecret(), the compiler should be clever enough to realize that XXH3_kSecret is never used, and therefore does not need to be present in the generated binary.

Going one step further, if one includes xxhash.h with XXH_INLINE_ALL directive, and then only employs *_withSecret() variants, and then provides its own custom secret as a compile-time constant blob of bytes, then the custom secret will effectively take the place of the default (internal) one, and the same set of optimizations will be applied to the hash functions using the custom secret, as if it was the internal one.

Cyan4973 commented 4 years ago

The KDF has been updated again, following recommendations from this thread (thanks @koraa) :

Fixed endianness issue, by using canonical representation.
- note : there will be a test, later on, which will ensure that the key generation function result in exactly the same outcome on both big and little endian platforms.
All segments are derived from the scrambler, using a different seed, breaking the chain of dependencies, to improve parallelism.
For each segment, the seed is derived from customSeed raw content, by extracting a 8-bytes segment from it, adding some more entropy to the secret (up to 88 bytes). More importantly, since the value of the seed is no longer implied by some static rule, it makes it impossible for an external actor to derive the value of the entire secret from the knowledge of one of its segments, since they can't know the seed value extracted from original customSeed.
The seed is also combined with segmentNb, to introduce some variability in case of a customSeed of poor quality featuring identical segments, such as a large bunch of zeroes. If the seed was only derived from customSeed, it could result in extracting the same seed value multiple times, thus producing the same segment multiple times.
Function documentation has been updated

I believe all these changes make the secret generation more robust.

The only downside I can think of is that the current code is not completely trivial to amend in order to create a potential future variant producing a secret of custom size. That being said, this is an issue for the future (if it ever happens).

koraa commented 4 years ago

The default (or internal) secret is meant to be read-only, so it's not mutable, and can't be swapped. This is an important property of the design, with direct consequences on generated code, which tends to be faster on small keys as the relevant portions of the secret are directly embedded into the instructions stream.

Internal secret – I mean the one in XXH3_state_t:

void XXH3_XXHRS_64bits_reset_withSecretCopy(XXH3_state_t* statePtr, const void* secret)  {
  XXH3_64bits_reset_internal(statePtr, 0, secret, XXH_SECRET_DEFAULT_SIZE);
  memcpy(statePtr->customSecret, secret, XXH_SECRET_DEFAULT_SIZE);
  statePtr->extSecret = NULL;
}

Cyan4973 commented 4 years ago

ah, you want to embed the secret directly into the state, as opposed to keeping it around as a reference

easyaspi314 commented 4 years ago

That…actually makes sense.

koraa commented 4 years ago

Thanks for updating the KDF again!

Is there any particular advantage to expanding the seed/secret into the entropy pool by the way? Unless I have missed something using the unseeded XXH3 variant as a compression function before applying the seed might be a better way?

XXH3_withSeed(data, seed) = ApplySeed(XXH3(data), seed)

apply seed could be another invocation of XXH3:

ApplySeed(hash, seed) = XXH3(hash || seed)

or use XOR to levrage short string support in XXH3

ApplySeed(hash, seed) = XXH3(hash ^ seed)

I imagine there are some better choices for ApplySeed not necessarily using the full XXH3 and that there would be a way to do something special for short strings so this does not cause x2 slow down from calculating XXH3 twice.

Using this function it would be trivially possible to define a hash function taking an arbitrarily long key (by just compressing the key into a 128bit hash; untested code for illustration purposes below; should not require API change).

This scheme could have a number of advantages:

The XXH3 function itself can concentrate on being fast while the security properties would be concentrated in ApplySeed which should be a lot easier to study since it just works with fixed size inputs.
Eliminates a bunch of code (no more secret pool derivation)
Reduces startup latency for seeded/Streaming variant
Reduces memory requirement for XXH3_state_t
As you said – using the default entropy pool should be faster because the values can be inlined.

XXH128_hash_t xxh3_128bits_withSeed128(const uint8_t *dat, size_t sz, XXH128_hash_t seed) {
  XXH128_hash_t dat2;

  if (XXH_likely(sz <= 16)) {
    // (Just for demonstration purposes; padding probably needs correcting)
    memcpy(&dat2, dat, sz);
  } else {
    dat2 = XXH3_128bits(dat, sz);
  }

  dat2.high64 ^= seed.high64;
  dat2.low64 ^= seed.low64;
  return XXH3_128bits(&dat2, sizeof(dat2));
}

XXH128_hash_t xxh3_128bits_withSeed(const uint8_t *dat, size_t sz, uint64_t seed) {
  XXH128_hash_t seed2 { .low64 = seed, .high64 = 0 };
  return xxh3_128bits_withSeed128(dat, sz, seed2);
}

XXH128_hash_t xxh3_128bits_withSecret(const uint8_t *dat, size_t sz, uint8_t secret, uint8_t secretSz) {
  XXH128_hash_t seed = XXH3_128bits(secret, secretSz);
  return xxh3_128bits_withSeed128(dat, sz, seed2);
}

Cyan4973 commented 4 years ago

Is there any particular advantage to expanding the seed/secret into the entropy pool by the way?

Yes, there is. If the seed is only applied at the end, it means that the processing is independent from the seed, with only the final bijective avalanche producing a different result. Therefore, it's possible to generate seed-independent collisions.

Making the seed part of the calculation is supposed to make such objective more difficult to achieve.

koraa commented 4 years ago

Pity, it would make things a lot easier. Didn't think of that; thank you!

Cyan4973 commented 4 years ago

Latest dev branch offers a KDF to generate a secret from any source of entropy, even of bad quality. It probably answers at least part of the question.

Cyan4973 / xxHash

XXH3 Keyed Variant long secrets #361

366 tries to solve the XXH3_copyState() bug, when copying a state initialized with a seed.

366 tries to solve the `XXH3_copyState()` bug, when copying a state initialized with a `seed`.