Closed koraa closed 4 years ago
Maybe we should make XXH3_initCustomSecret
public.
The minimum size for a secret is 136
bytes,
but indeed, that's still a too large a length for the described use case.
At the other extreme, it's possible to initialize using a seed
, which must be exactly 8-bytes long.
Therefore, there is a gap between those two variants, which do not allow initialization with a 32-byte key. This is a capability XXH3
doesn't have yet, but indeed, it could be added.
When it comes to very large inputs, there's no problem : it's certainly possible to generate a "large enough" secret from a too small one, and could be added easily.
But for API completeness, it doesn't seem right to offer this option only for "long" inputs. One also has to consider "small inputs" (<= 240 bytes). And now it becomes a more complex story.
A simple generic solution would always generate the full internal secret, and work from there. This would work fine, but there is a "startup cost" for each hash, when generating the internal secret, and this fixed cost becomes non trivial as input length becomes small.
For example, presuming the input is just 5 bytes long, and the custom secret is 32-bytes long, the naive initialization method would still generate a full internal secret (likely 192-bytes long) before starting calculation of the hash on the 5-bytes input, adding some non-negligible latency.
Since only a portion of the secret will be used, an alternative could be to generate only the portion of the secret that will be actually used for hash calculation. For example, for a 5-byte input, it is only 16-bytes long. However, this would add branches at the beginning, potentially harming performance more than a straightforward initialization operating on a stable length.
One possibility to speed up small-input scenarios could be to create an hybrid, using only the first 8-bytes of the custom secret as a seed
when calculating "small" hashes. This way, hopefully, the outcome would be closer to the _withSeed()
variant. However, there are downsides : any byte of secret beyond the 8th one is ignored, hence doesn't influence the calculated hash. This considerably reduces the entropy of the custom secret.
Also, on a more minor note, one can notice that the custom secret must still be at least 8-bytes long (although I guess it could be possible to create yet another variant for very short secrets), and resulting speed is probably slightly slower than the "pure" _withSeed()
variant, though likely by a small margin (to be measured).
But the most important problem is the first one, when the entropy of the 32-byte secret is reduced to its first 8-bytes.
Finally, another possibility is to expose a public "secret generator".
This one would take as input a secret of any length, and convert it into a secret
of suitable length, which can be directly consumed by the _withSecret()
variant, without additional conversion (like today).
The downsides are :
secret
, which means it also introduces memory management topics. Who's in charge of allocation ? Don't forget to free()
after usage.Maybe the "secret generator" could be an optional optimization step, which would help speed (specifically for batch-processed "small" inputs) by "sharing" the initialization stage, but not be required.
In the end, it may look like a small feature to add, but there are several questions to nail right to preserve API completeness and make it happen.
Thanks for your quick, comprehensive responses! Don't have much time today so I'll take the time to write a proper reply next week!
First of all, I agree this is not an easy feature; generating a long secret from a long one is easy but making it perform well & while not diminishing security properties certainly is not…
At the other extreme, it's possible to initialize using a seed, which must be exactly 8-bytes long.
What are the security properties of the seeded variant vs the keyed variant? The key derivation step seems relatively basic…alternately adding and subtracting the key from the long secret?
My particular use case btw is evaluating XXH3 as a hash function for rust hash tables; this use case has some specific properties: I can generate the sequence once and use it many times, needs at least some collision attack resistance, it will always use streaming mode (so I suspect a lot of the short input optimizations may be lost?)
Streaming does have measurable overhead for short hashes.
As for a KDF, how about doing something like XXH3_initCustomSecret
, but like this:
#define XXH_PRIVATE_API
#include "xxhash.h"
#include <stdlib.h>
static xxh_u64 XXH3_circularReadLE64(const xxh_u8 *buf, size_t offset, size_t size)
{
offset %= size;
if (XXH_likely(offset + 8 < size)) {
return XXH_readLE64(&buf[offset]);
} else {
/* slow but does the job. */
xxh_u64 ret = 0;
size_t i;
for (i = 0; i < sizeof(xxh_u64); i++) {
ret |= buf[(offset + i) % size] << (8 * i);
}
return ret;
}
}
void *XXH3_generateCustomSecret(const xxh_u8* seed, size_t seedLen)
{
xxh_u8 *customSecret = XXH_alignedMalloc(XXH_SECRET_DEFAULT_SIZE, 16);
size_t const nbRounds = XXH_SECRET_DEFAULT_SIZE / 16;
size_t i,j;
if (customSecret == NULL) return NULL;
XXH_memcpy(customSecret, kSecret, XXH_SECRET_DEFAULT_SIZE);
if (seed == NULL || seedLen == 0) {
return customSecret;
}
for (j = 0; j <= seedLen / XXH_SECRET_DEFAULT_SIZE; j++) {
for (i=0; i < nbRounds; i++) {
xxh_u64 lo = XXH_readLE64(customSecret + 16*i) + XXH3_circularReadLE64(seed + (XXH_SECRET_DEFAULT_SIZE*j), 16*i + (seed[(2*i)%8]), seedLen % XXH_SECRET_DEFAULT_SIZE);
xxh_u64 hi = XXH_readLE64(customSecret + 16*i + 8) - XXH3_circularReadLE64(seed + (XXH_SECRET_DEFAULT_SIZE*j), 16*i+8 + (seed[(2*i+1)%8]), seedLen % XXH_SECRET_DEFAULT_SIZE);
XXH_writeLE64(customSecret + 16*i, lo);
XXH_writeLE64(customSecret + 16*i + 8, hi);
}
}
return customSecret;
}
void XXH3_freeCustomSecret(void *secret)
{
XXH_alignedFree(secret);
}
//.....
#include <stdio.h>
static void print_buf(const xxh_u8 *ptr, size_t len)
{
for (size_t i = 0; i < len / 16; i++) {
for (size_t j = 0; j < 16; j++) {
printf("%02x, ", *ptr++);
}
printf("\n");
}
if (len % 16) {
for (size_t j = 0; j < len % 16; j++) {
printf("%02x, ", *ptr++);
}
printf("\n");
}
}
#ifndef TMP_BUF_SIZE
#define TMP_BUF_SIZE 32
#endif
int main(void)
{
xxh_u8 tmp[TMP_BUF_SIZE] = { 0 };
FILE *f = fopen("/dev/urandom", "rb");
fread(tmp, sizeof(tmp), 1, f);
fclose(f);
printf("in:\n");
print_buf(tmp, sizeof(tmp));
printf("out:\n");
xxh_u8 *ptr = XXH3_generateCustomSecret(tmp, sizeof(tmp));
print_buf(ptr, XXH_SECRET_DEFAULT_SIZE);
XXH3_freeCustomSecret(ptr);
}
Just a rough 15 minute idea, but we could do something better.
My particular use case btw is evaluating XXH3 as a hash function for rust hash tables
I presume it means that a substantial portion of inputs will be "small" (as in < 240 bytes).
needs at least some collision attack resistance,
Ok, so collision resistance is an important property.
What are the security properties of the seeded variant vs the keyed variant? The key derivation step seems relatively basic
Indeed, the seeded variant is designed to be fast, first and foremost. So, while it alters the internal secret in a predictable way, it does so using a simple sequence.
If you're looking for an obfuscation strategy which makes it more difficult to rewind to the original "short secret", we could consider a strong strategy, with little regards for its speed. In which case, there is no shortage of technique that could be used to digest the "short secret" and transform it into a suitably longer one
The only issue is that such a preparation stage now has to be offline, otherwise it would cost too much to perform for each hash.
As a consequence, it also calls into question if there is still a need to accept "short secrets" in the regular interface (like XXH3_64bits_withSecret()
), since such a scenario would require a much faster initialization delay, and therefore cannot support the same initialization strength.
If both entry points are allowed, a hash based on a small secret would produce a different result than a hash based on the long secret generated from said short secret by the generator (still follow me?:p). I presume this could be well documented, and the distinction between the 2 scenarios could be made quite clear, but a risk of confusion will likely remain.
Alternatively, if the regular interface doesn't accept short secrets, and if the generator only accepts short secrets (the ones that the regular interface refuses, hence with len < XXH3_SECRET_SIZE_MIN bytes
), then results can't be directly compared, which reduces risks of confusion.
That being said, there are additional advantages to a clearly separated generator. Such a generator would not be restricted by the kind of input it can accept, it would always generate a fixed 192 bytes secret from any input, including longer ones. This may seem overkill, but I can see a use case for that. Presuming, for example, that someone wants to use a text entry as its custom secret, or any other low entropy source available around, it should be pretty clear that its distribution of 1 and 0 bits is not random enough, and it will impact the quality of the produced hash. In contrast, the secret generator would ensure that the generated secret is always random enough, whatever the input (even silly ones like a bunch of zeroes), essentially guaranteeing the quality of generated hash. The larger purpose therefore is to generate a "high entropy" secret from any source, including short ones.
I've finally been able to take a closer look at the keyed version of XXH3; I am not done yet done forming an opinion, but here are some observations.
struct XXH3_state_s {
XXH_ALIGN_MEMBER(64, XXH64_hash_t acc[8]);
/* used to store a custom secret generated from the seed. Makes state larger.
* Design might change */
XXH_ALIGN_MEMBER(64, unsigned char customSecret[XXH3_SECRET_DEFAULT_SIZE]);
XXH_ALIGN_MEMBER(64, unsigned char buffer[XXH3_INTERNALBUFFER_SIZE]);
XXH32_hash_t bufferedSize;
XXH32_hash_t nbStripesPerBlock;
XXH32_hash_t nbStripesSoFar;
XXH32_hash_t secretLimit;
XXH32_hash_t reserved32;
XXH32_hash_t reserved32_2;
XXH64_hash_t totalLen;
XXH64_hash_t seed;
XXH64_hash_t reserved64;
/* note: there is some padding after due to alignment on 64 bytes */
const unsigned char* secret;
}; /* typedef'd to XXH3_state_t */
As far as I understood this, customSecret contains the secret for the seeded variant, while secret points to the secret used (to the default one for the default variant and to the external one for the external variant). This seems kind of odd to me; it makes XXH3_state_s non-movable for the seeded variant (but both other variants) and means the secret could be modified while XXH3_state_s exists…
Non immovability is a bit of an issue for C++ and rust integration. I think at least that could be solved by setting "secret=NULL" when the secret is in customSecret.
Streaming does have measurable overhead for short hashes.
OK, but there is a fixed setup phase for both 64 and 128 bit variants with seeded+streaming? You always need to copy the default secret and derive the seed, even for very short hashes.
Then I started playing around with different secrets and look at this:
// c++ src/test.cc --std=c++17 -Wall -Wextra -Wpedantic -o src/test -Ivendor/xxhash && src/test
#define XXH_STATIC_LINKING_ONLY 1
#define XXH_INLINE_ALL 1
#include <cassert>
#include <cstdint>
#include <utility>
#include <string>
#include <vector>
#include <iomanip>
#include <iostream>
#include <fstream>
#include <filesystem>
#include <xxhash.h>
namespace fs = std::filesystem;
void pt(uint8_t val) {
std::cout << std::hex << std::setfill('0') << std::setw(2) << ((int)val);
}
void pt(uint32_t val) {
std::cout << std::hex << std::setfill('0') << std::setw(8) << val;
}
void pt(uint64_t val) {
std::cout << std::hex << std::setfill('0') << std::setw(16) << val;
}
void pt(const XXH128_hash_t &val) {
std::cout
<< std::hex << std::setfill('0')
<< std::setw(16) << val.high64
<< std::setw(16) << val.low64;
}
int main() {
uint8_t data[] = { 0x00, 0x00 };
uint8_t key[192];
std::fill(std::begin(key), std::end(key), 0);
for (size_t idx=0; idx < std::size(key); idx++) {
for (size_t bit=0; bit < 8; bit++) {
uint8_t &v = key[std::size(key) - idx - 1];
v = (v << 1) | 1;
pt((uint8_t)idx);
std::cout << ":" << bit << " ";
pt(XXH3_128bits_withSecret(std::data(data), std::size(data), std::data(key), std::size(key)));
std::cout << "\n";
}
}
return 0;
}
Not going to post the output, but it looks like, for short inputs XXH3_64/128 are ignoring the anything but the first 64/128 bits of the key; for a key this is not good, the expectation for a key IMO is that all bits of the key are integrated in the hash; I recommend also taking a look at some of the hashes; e.g. xxh3_64 yields 6ddd9d6c6cdec134
for key=zero192, data=zero2. It might just be me, but this looks anything but random. As you said, XXH3 relies on the secret entropy a lot…
This is perfectly fine, as long as the key (maybe call it something different, how about "entropy") was generated from a decent source of randomness…be it the system CSPRNG or a PRNG…
Have you considered using some sort of online key schedule; I could imagine generating the long seed on the fly using a vectorized xorshiro PRNG could be quite fast and yield a lot better results? It could also be faster for the seeded/streaming variant (no setup) and might even be fastor for other variants because no cache access would be needed.
I presume it means that a substantial portion of inputs will be "small" (as in < 240 bytes).
Probably, yes.
Not going to post the output, but it looks like, for short inputs XXH3_64/128 are ignoring the anything but the first 64/128 bits of the key
If every byte of the secret affected XXH3's output, performance will plummet.
We actually choose different parts of the secret on short keys (and mix two samples) to make up for this.
Before, short keys used exclusively the first few bytes.
If every byte of the secret affected XXH3's output, performance will plummet.
We actually choose different parts of the secret on short keys (and mix two samples) to make up for this.
Before, short keys used exclusively the first few bytes.
Totally makes sense, I am just pointing out that this is one of the expectations that comes with having a keyed hash function; it makes total sense from a performance perspective to do it this way; I do think this introduces the need to be clearer in the API design & documentation though…
Also, your "modern C++" is not modern enough! Unacceptable.
Can't argue with that. sadkitten.gif
EDIT I think XXH3_copyState should also update the internal pointer; otherwise it can lead to a use after free…
I do think this introduces the need to be clearer in the API design & documentation though…
Hmm, maybe we should just replace the comments with /* chopped liver */
. Clearly, nobody reads them, and it will save space.
Yann, should I open a PR? :smirk:
https://github.com/Cyan4973/xxHash/blob/d8bed9a54d6bcc6f7165b61647d4b066e1ae17eb/xxh3.h#L671-L686
EDIT I think XXH3_copyState should also update the internal pointer; otherwise it can lead to a use after free…
I mean I thought we made it pretty clear... https://github.com/Cyan4973/xxHash/blob/d8bed9a54d6bcc6f7165b61647d4b066e1ae17eb/xxhash.h#L600-L605
https://github.com/Cyan4973/xxHash/blob/d8bed9a54d6bcc6f7165b61647d4b066e1ae17eb/xxhash.h#L600-L605
This affects the seeded variant; not the variant with a secret.
// pseudocode
a = XXH3_createState();
XXH3_64bits_reset_withSeed(a, 0xabcd);
b = XXH3_createState();
XXH3_copyState(b, a);
XXH3_freeState(a);
XXH3_update(b, ...);
Unless I have missed something, this will cause a use after free, because secret in b points at customSecret in a (violating the requirement you quoted above).
We don't malloc or free the secret pointer, and you can see that by how our pointer is const unsigned char *
.
void free(void *ptr);
https://github.com/Cyan4973/xxHash/blob/d8bed9a54d6bcc6f7165b61647d4b066e1ae17eb/xxhash.h#L572
This simplifies things and avoids unnecessary comments:
XXH_errorcode XXH3_64bits_reset(XXH3_state_t *state)
{
...
state->secret = kSecret;
...
}
XXH_errorcode XXH3_64bits_reset_withSeed(XXH3_state_t *state, XXH64_hash_t seed)
{
...
// XXH3_state_t::customSecret is a fixed size array
XXH3_initCustomSecret(state->customSecret, seed);
state->secret = state->customSecret;
...
}
XXH_errorcode XXH3_64bits_reset_withSecret(XXH3_state_t *state, const void *customSecret, size_t secretSize)
{
...
state->secret = customSecret;
...
}
It also allows the state to be 100% stack based and used the exact same way as XXH32_state_t
and XXH64_state_t
.
for short inputs XXH3_64/128 are ignoring the anything but the first 64/128 bits of the key
That's correct. For short inputs, only portions of the secret are used.
the expectation for a key IMO is that all bits of the key are integrated in the hash
This situation can be solved with a secret
generator. This one can make sure that any bit of the custom seed will contribute to all results, for any input of any length, by drastically altering the generated secret
.
Note that, during the hash calculation, only a portion of the secret
will be used, it's just that all portions of the secret
are sensible to any bit in the custom seed.
XXH3 relies on the secret entropy a lot
It totally relies on it. The secret
must have a high entropy.
This situation can also be solved via the secret
generator, as any custom seed, even ridiculously trivial (like a bunch of zeroes), will be turned into a high entropy secret
.
it makes XXH3_state_s non-movable for the seeded variant
That's a good point.
This design could be changed, using your suggested convention that secret == NULL
means "use customSecret
instead, but it will add a branch at the beginning of each invocation, in order to select the right pointer. In XXH3
, we try to avoid branches, as they impact performance, especially on short inputs. That being said, the streaming variant already adds quite some overhead, so an additional pointer comparison branch may not be that sensible.
I could imagine generating the long seed on the fly using a vectorized xorshiro PRNG could be quite fast and yield a lot better results? It could also be faster for the seeded/streaming variant (no setup) and might even be fastor for other variants because no cache access would be needed.
That's more or less the intended design for the secret
generator, although the details can vary, and what I had in mind is not specifically the "xorshiro PRNG". Also, my expectation is that the secret
generator, while not being especially "slow", will not be fast enough to consider its delay negligible, especially in combination with small inputs. So a "faster" performance wasn't on the list.
It could be that the implementation details matter, and the "generator" I had in mind could be meaningfully improved. Maybe that's a discussion we could have once a first design is accessible and can be commented.
I think XXH3_copyState should also update the internal pointer; otherwise it can lead to a use after free…
Let's have a look, it could be a bug.
OH, now I see the use after free issue. Yeah, major bug.
Maybe we should add a set of flags in one of the reserved fields.
That would also allow us to prevent mixing XXH3_64B and XXH3_128B, but that would require a behavior change (although we could rule the existing behavior as UB).
That's more or less the intended design for the secret generator, although the details can vary, and what I had in mind is not specifically the "xorshiro PRNG". Also, my expectation is that the secret generator, while not being especially "slow", will not be fast enough to consider its delay negligible, especially in combination with small inputs. So a "faster" performance wasn't on the list.
Great! Thank you, that's awesome!
XXH3_copyState()
bug, when copying a state initialized with a seed
.It also makes XXH3_state_t
movable, meaning that a trivial memcpy()
results in a valid state preserving initial properties.
As for a secret generator, why not do a variant of wyrand?
We already have the mult+fold routine, so it wouldn't need much effort to implement.
/* A variant of wyrand */
xxh_u64 counter = 0;
size_t i;
for (i = 0; i < XXH_SECRET_SIZE_MIN / 8; i++)
{
xxh_u64 seed_tmp = XXH_circularReadLE64(seedBuf, i * 8, seedBufSize);
counter += XXH_readLE64(kSecret + i * 8);
XXH_writeLE64(
customSecret + 8 * i,
XXH3_mult64to128_fold64(counter ^ seed_val, counter)
);
}
Here is a proposed prototype for the requested Key Derivation Function :
/*
* XXH3_generateSecret():
* Take as input a custom seed of any length and any content,
* generate from it a high-quality secret of length XXH3_SECRET_DEFAULT_SIZE
* into already allocated buffer secretBuffer.
* The generated secret can then be used with any `*_withSecret()` variant.
* customSeed can be anything, even a "low entropy" source, such as a bunch of zeroes.
* It can also have any size (even < XXH3_SECRET_SIZE_MIN).
*/
#define XXH3_SECRET_DEFAULT_SIZE 192
XXH3_PUBLIC_API void XXH_generateSecret(void* secretBuffer, const void* customSeed, size_t customSeedSize);
In this proposal, the generated secret always has a fixed size of XXH3_SECRET_DEFAULT_SIZE
bytes. As a consequence, the function is guaranteed to always work, provided that secretBuffer
is already allocated.
A more advanced proposal would allow generation of secrets of any size, making secretSize
a parameter of the function. In which case though, secretSize
must respect a minimum of XXH3_SECRET_SIZE_MIN
, meaning that the function can fail if this condition is not respected, and must return an error code (which must then be checked by the user). The implementation side is also quite a bit more complex.
This feels like a more advanced capability, that could be implemented in a future version.
A few words on the expected properties of this function :
seed
method, and that's why it's designed to be invoked offline, using its own dedicated function. That being said, there is still considerable leverage in selecting an appropriate speed / randomization trade-off. I guess this discussion will be more clearly illustrated once there is an actual implementation to discuss about.Well what if we generate an XXH3/XXH64 hash from the custom secret, and use that as the seed for a PRNG function?
Well what if we generate an XXH3/XXH64 hash from the custom secret, and use that as the seed for a PRNG function?
Indeed, there will be a little bit of that. An equivalent method will be used to ensure that any bit of the custom seed will contribute to the entire secret.
That being said, if it was all there is, then one could do it right now, using existing functions.
However, one of the issues mentioned earlier in this thread is that the seed
derivation function is very light, meaning it's still possible to infer some knowledge from the secret
even without knowing the seed
.
For example, let's call key1 == SeededSecret[0-7]
, and key3 == SeededSecret[16-23]
.
Then, we can infer the euclidian distance between key1
and key3
, because SeededSecret[0-7] = defaultSecret[0-7] + seed
and SeededSecret[16-23] = defaultSecret[16-23] + seed
, therefore key3 - key1 == defaultSecret[16-23] - defaultSecret[0-7]
, aka a constant.
It's unclear how this knowledge could be leveraged, but there is no denying that some level of information can still get inferred knowing that the algorithm used is the _withSeed()
variant.
Now, it's not necessarily horrible, because xxHash is a non-cryptographic hash, so it's not expected to offer a large level of protection anyway.
Nonetheless, I'm guessing that users of the _withSecret()
variant are looking for a level of obfuscation which is superior to the _withSeed()
variant, therefore, employing the exact same "light" derivation wouldn't do it.
So, I will propose a derivation function which is a bit more involved that the seed
derivation one. How much more though will be the debatable question.
In branch kdf, there is a first version of the Key Derivation Function. This is the simplest version I could think off, I expect it to be relatively fast, while providing the qualities discussed in this thread :
customSeed
, of any length, including very short onesseed
impact all bits of the produced secret
, and therefore impact all produced hash values for any input
length.Here is how it works :
scrambler
pattern from the custom seed
, using XXH128()
. This ensures that all bits from the customSeed
contribute.(defaultSecret ^ customSeed) + scrambler
Since the secret
is always 192 bytes, the part of customSeed
which is directly combined is only the first 192 bytes. The other bytes still contribute to the final secret
through the scrambler
.
When the customSeed
is smaller than 192 bytes, it's applied repetitively, by segments of 16 bytes (so remaining bytes are just padded with 0
).
Compared to "just applying the scrambler", this is a bit better, as it avoids generating a secret which is merely defaultSecret + constant
.
That being said, any time customSeedSize <= 16 bytes
, this is effectively what happens (well, ^
then +
is slightly different, but close enough).
Also the variability between segments still depends on the raw content of customSeed
, so if its content is pathological (like a lot of space characters), the variability becomes closer to defaultSecret + constant
.
If this situation is deemed not good enough, it's possible to consider a stronger variant, where the scrambler
is updated at every segment.
For example, by doing scramblerN2 = XXH128(scramblerN1)
. The scrambler sequence is effectively limited to 128-bit of entropy, but that should be good enough for a 64 or 128 bit hash.
An alternative is to hash the whole customSeed
with just a different seed for each segment, but if customSeed
is very long, then it would end up hashing a long input 12 times.
Why are you using XXH128_hash_t like that? It's kinda weird.
any better solution ?
I went ahead, and updated the generator to a second version, where the 128-bit scrambler is updated every 16 bytes. This removes the possibility that separated segments of the secret might be related by a certain delta when the custom seed is pathological (too small, or full of zero).
Incidentally, it also simplifies the code, because there is no need anymore to combine the scrambler
with kDefaultSecret
nor rawCustomSeedContent
. All randomness qualities are guaranteed by XXH128()
alone.
On the downside, one can think of :
XXH128()
is invoked 12 times instead of 1.
customSeed
.customSeed
is effectively reduced to 128-bit, whatever the original content.
generateSecret()
, an unauthorized actor can extract or guess the first 16-bytes of the secret
, it can determine the rest of the secret by iterating XXH128()
the same way.
XXH128()
for 16-bytes inputs could have been exploited to rebuild the first 16-bytes of the secret (assuming an actor has access to both the input and the full return value of the hash function), but that's no longer the case : now, such an actor can only extract a combination of secret[32-47] ^ secret[48-55]
, which is not trivially exploitable.XXH128()
is bijective for 128-bit inputs, there is no space reduction after each round. So one cannot hope that the last segment of the generated secret belongs to a reduced list of candidates.I believe that's reasonable in term of quality. The only missing capability is to generate a secret
of custom size (right now, the generator only generate secrets of size XXH_SECRET_DEFAULT_SIZE
, aka 192 bytes), but as mentioned previously, this could be done in a later advanced variant.
The new code is actually even easier to update in order to generate custom secret sizes.
As always, this is opened to discussion.
I think reducing the entropy of the secret to 128 bits is perfectly acceptable for a 128 bits key…
Using scrambler_n+1 = XXH128(scrambler_n)
basically implements a PRNG; why not use an existing PRNG for the expansion stage? xorshiro is quite fast to implement…If this is not an option, using a 64 bit counter (scrambler_n = XXH128(data=scrambler_0, seed=n)
) could be easier to parallelize in the feature and might even enable the generator to utilize instruction parallelism now…
EDIT 1 Isn't the byte representation of XXH128 endianess dependent? How about mandating LE conversion for the KDF (BE/canonical would require conversion on LE systems).
EDIT 2 May I suggest adding API documentation?
Derive a secret for use with keyed XXH3 versions. Use this if you need a higher level of security than the one provided by 64bit seed.
The generated secret ALWAYS is XXH_SECRET_DEFAULT_SIZE bytes long.
The functions `XXH3_128bits_withSecret`, `XXH3_64bits_withSecret`, `XXH3_128bits_reset_withSecret` and `XXH3_64bits_reset_withSecret`
each accept a secret parameter. This parameter is very long for implementation reasons so for most applications you should just use
this function to generate the secret.
Supplying NULL as the customSeed copys the default secret into the secret buffer.
Supplying NULL as customSeed and any other value than zero for the customSeedSize is undefined behavior.
Note it feels a bit weird using the term "secret" here; maybe it would be a good idea to abandon the term "secret" for use in a non crypto hash or to clarify the security guarantees/security model the keyed variants actually use. Like what kind of attack at what effort would be acceptable for XXH3_withSecret?
EDIT 3 Have you considered making an API to access the internal secret possible? Could be void reset_withSecretCopy(const void *secret)
or even an API to void* reset_unsafeSecretInit()
to allow users to manually initialize the secret (for those who like to live dangerously).
why not use an existing PRNG for the expansion stage? xorshiro is quite fast to implement
This would add yet another code, with associated complexity & maintenance, for the purpose of a single use case. If I was looking for a 64-bit prng, xorshiro being a 64-bit prng would be a reasonable candidate, but even then, XXH3_64bits()
would also be pretty good, as it is based on rrmxmx
which has been measured by practrand
as providing higher quality that xorshiro (which is already pretty good).
Anyway, as mentioned, I'm rather looking for a 128-bit prng, because the secret
can be used for both XXH3_64bits()
and XXH3_128bits()
, so 128-bit looks more appropriate for a 128-bit hash generator.
using a 64 bit counter (scrambler_n = XXH128(data=scrambler_0, seed=n)) could be easier to parallelize
Indeed, that's a very good point !
Isn't the byte representation of XXH128 endianess dependent?
Yes it is ! I should have used the canonical representation of XXH128_hash_t
, which is endian independent ! To be fixed !
May I suggest adding API documentation?
Yes, these are good suggestions. I'll update the documentation accordingly.
Note it feels a bit weird using the term "secret" here; maybe it would be a good idea to abandon the term "secret" for use in a non crypto hash or to clarify the security guarantees/security model the keyed variants actually use. Like what kind of attack at what effort would be acceptable for XXH3_withSecret?
I can try, though it's not a simple objective. A "proper name" could be "entropyPool", but that's a mouthful, and I'm concerned it would feel too cryptic, misunderstood and perhaps even frighten users.
Regarding security, my next goal, after the generator, is to try to defeat XXH3
, in order to identify obvious vulnerability scenarios, allowing collision generations using much less than brute-force attempts.
But even that does not offer "cryptographic" qualities, which is a high bar that I don't even know how to reach. And in these days and age, there is no shortage of extremists eager to troll that anything less than absolute perfection is worth zero, making it very difficult to introduce, define and defend intermediate levels.
So I'm not sure what the outcome of this effort will be.
Have you considered making an API to access the internal secret possible? Could be
void reset_withSecretCopy(const void *secret)
or even an API tovoid* reset_unsafeSecretInit()
to allow users to manually initialize the secret (for those who like to live dangerously).
I may not properly understand the question. Isn't that equivalent to XXH3_64bits_reset_withSecret()
?
Thanks for the insightful comments @koraa !
I may not properly understand the question. Isn't that equivalent to
XXH3_64bits_reset_withSecret()
?
Sorry, should have been clearer; I meant the internal secret so users can avoid allocating an external buffer if they use a XXH_SECRET_DEFAULT_SIZE sized secret.
Thank you for taking the time to work on this!
Sorry, should have been clearer; I meant the internal secret so users can avoid allocating an external buffer if they use a XXH_SECRET_DEFAULT_SIZE sized secret.
The default (or internal) secret is meant to be read-only, so it's not mutable, and can't be swapped. This is an important property of the design, with direct consequences on generated code, which tends to be faster on small keys as the relevant portions of the secret
are directly embedded into the instructions stream.
It also means that it doesn't actually use "memory", in the sense of "allocated RAM". What it does use is code space, but that part could be in ROM, or within protected memory, so it is not directly comparable.
And even that cost can be avoided if needed. The idea is to rely on compiler's Dead Code Elimination.
Presuming the library is directly included into target project (as opposed to linking a dynamic library), if functions invoked are limited to XXH3_64bits_withSecret()
or XXH3_128bits_withSecret()
, the compiler should be clever enough to realize that XXH3_kSecret
is never used, and therefore does not need to be present in the generated binary.
Going one step further, if one includes xxhash.h
with XXH_INLINE_ALL
directive, and then only employs *_withSecret()
variants, and then provides its own custom secret as a compile-time constant blob of bytes, then the custom secret will effectively take the place of the default (internal) one, and the same set of optimizations will be applied to the hash functions using the custom secret, as if it was the internal one.
The KDF has been updated again, following recommendations from this thread (thanks @koraa) :
seed
, breaking the chain of dependencies, to improve parallelism.seed
is derived from customSeed
raw content, by extracting a 8-bytes segment from it, adding some more entropy to the secret
(up to 88 bytes). More importantly, since the value of the seed
is no longer implied by some static rule, it makes it impossible for an external actor to derive the value of the entire secret
from the knowledge of one of its segments, since they can't know the seed
value extracted from original customSeed
.seed
is also combined with segmentNb
, to introduce some variability in case of a customSeed
of poor quality featuring identical segments, such as a large bunch of zeroes. If the seed
was only derived from customSeed
, it could result in extracting the same seed
value multiple times, thus producing the same segment multiple times.I believe all these changes make the secret
generation more robust.
The only downside I can think of is that the current code is not completely trivial to amend in order to create a potential future variant producing a secret
of custom size.
That being said, this is an issue for the future (if it ever happens).
The default (or internal) secret is meant to be read-only, so it's not mutable, and can't be swapped. This is an important property of the design, with direct consequences on generated code, which tends to be faster on small keys as the relevant portions of the
secret
are directly embedded into the instructions stream.
Internal secret – I mean the one in XXH3_state_t:
void XXH3_XXHRS_64bits_reset_withSecretCopy(XXH3_state_t* statePtr, const void* secret) {
XXH3_64bits_reset_internal(statePtr, 0, secret, XXH_SECRET_DEFAULT_SIZE);
memcpy(statePtr->customSecret, secret, XXH_SECRET_DEFAULT_SIZE);
statePtr->extSecret = NULL;
}
ah, you want to embed the secret
directly into the state, as opposed to keeping it around as a reference
That…actually makes sense.
Thanks for updating the KDF again!
Is there any particular advantage to expanding the seed/secret into the entropy pool by the way? Unless I have missed something using the unseeded XXH3 variant as a compression function before applying the seed might be a better way?
XXH3_withSeed(data, seed) = ApplySeed(XXH3(data), seed)
apply seed could be another invocation of XXH3:
ApplySeed(hash, seed) = XXH3(hash || seed)
or use XOR to levrage short string support in XXH3
ApplySeed(hash, seed) = XXH3(hash ^ seed)
I imagine there are some better choices for ApplySeed not necessarily using the full XXH3 and that there would be a way to do something special for short strings so this does not cause x2 slow down from calculating XXH3 twice.
Using this function it would be trivially possible to define a hash function taking an arbitrarily long key (by just compressing the key into a 128bit hash; untested code for illustration purposes below; should not require API change).
This scheme could have a number of advantages:
XXH3_state_t
XXH128_hash_t xxh3_128bits_withSeed128(const uint8_t *dat, size_t sz, XXH128_hash_t seed) {
XXH128_hash_t dat2;
if (XXH_likely(sz <= 16)) {
// (Just for demonstration purposes; padding probably needs correcting)
memcpy(&dat2, dat, sz);
} else {
dat2 = XXH3_128bits(dat, sz);
}
dat2.high64 ^= seed.high64;
dat2.low64 ^= seed.low64;
return XXH3_128bits(&dat2, sizeof(dat2));
}
XXH128_hash_t xxh3_128bits_withSeed(const uint8_t *dat, size_t sz, uint64_t seed) {
XXH128_hash_t seed2 { .low64 = seed, .high64 = 0 };
return xxh3_128bits_withSeed128(dat, sz, seed2);
}
XXH128_hash_t xxh3_128bits_withSecret(const uint8_t *dat, size_t sz, uint8_t secret, uint8_t secretSz) {
XXH128_hash_t seed = XXH3_128bits(secret, secretSz);
return xxh3_128bits_withSeed128(dat, sz, seed2);
}
Is there any particular advantage to expanding the seed/secret into the entropy pool by the way?
Yes, there is. If the seed
is only applied at the end, it means that the processing is independent from the seed
, with only the final bijective avalanche producing a different result. Therefore, it's possible to generate seed
-independent collisions.
Making the seed
part of the calculation is supposed to make such objective more difficult to achieve.
Pity, it would make things a lot easier. Didn't think of that; thank you!
Latest dev
branch offers a KDF to generate a secret
from any source of entropy, even of bad quality. It probably answers at least part of the question.
Hi,
thanks again for creating XXH3! Very impressive!
I just finally realized that the keyed variant of XXH3 requires a 192 byte secret (I always assumed it would be 192 bits). Should we make this point extra clear in the doc since 192 bit is a common key size? I'll be happy to open a PR…
192 byte also seems a bit excessive for many use cases; in terms of entropy a much shorter key length should suffice for a non cryptographic hash function…For my use case I am probably going to use chacha20 to expand a 32 byte key, or maybe SHAKE to support other key lengths, but maybe you have a better idea?
Could a specialized KDF be added to xxhash to support shorter key lengths natively?