Closed dhardy closed 6 years ago
I realised SeedableRng::Seed
isn't so simple... on BE using a [u64; N] array requires conversion from byte sequences and to [u32; N] arrays. So options:
u64
arrays, require byte-swapping when reading byte-seq keys and when converting to u32
arrays (e.g. for ChaCha)u32
arrays where preferred by the RNG, byte swap on reading and hash-function finalisationu8
arrays, byte-swap on hash-function finalisation, on writing large-integer seeds, and on reading in RNGsAfaik all CSPRNGs seed with [u8; N]
, presumably all employing algorithms entered into crypto competitions anyways. Also, CSPRNGs should never be seeded with less. If you want to seed a PRNG with less, then either you should use a faster non-cryptographic PRNG, or else you'll do key crazy slow stretching algorithms, like Argon2.
Are there actually situations where performance suggest feeding short seed to a non-cryptographic PRNG? Appears no: If you create few PRNGs then obviously you do not care about the seed length. If you create enough PRNGs that consuming CSPRNG output to seed them all takes too long then consuming even small amounts still takes ages, so you should spawn another PRNG with the desired speed and quality to seed your PRNGs. We support this use case by exposing the ReseedableRng
machinery used by thred_rng()
. I therefore think this whole from_hashable
business sounds misguided.
@burdges are you also saying that from_rng
should only allow seeding from a CSPRNG (CryptoRng
)? I'm tentatively for that change but it does mean thread_rng
needs to be a CSPRNG, though if we go with #53 I guess that's ok.
Slightly confused because I don't talk about key stretching here. Unless you mean you don't like my modifications to SeaHash? I'm reasonably convinced they're fit for purpose; alternatively we need to switch to another hash function with 128 bit or more output.
No. It's true user-space CSPRNGs would usually be seeded with from_rng
, but they might be seeded from another source, like secret from disk or a key exchange.
I'm saying: Afaik, non-cryptographic PRNG should usually be seeded with from_rng
, not a single u64
, because Rng
s are designed to produce the larger quantity of output needed to fill their seed/state quickly.
I'm still confused about what you mean.
On the first account, because seeding a CSPRNG from a known secret from disk or key exchange can use from_seed
, so doesn't need to use from_rng
. What I was saying is that from_rng
could require that the source RNG is a CSPRNG.
On the second account, the point of from_hashable
is that when a user wants to seed a non-crypto PRNG with a fixed seed, we want to make it easy for them to convert whatever seed the user may enter ("my seed", 1, "obayuany-ofu", or whatever other nonsense the user enters) into a "nice" seed with even bit distribution. So from_hashable
will never be used when the source is another RNG.
I got a fair bit of work done tidying this up. Not convinced whether we should use SeaHash but it's mostly okay.
I also ported Google's HighwayHash to Rust. It's much more complex than SeaHash and has some oddities. Still need to test against upstream output and optimise a lot; not sure at the moment whether I wish to do this.
I'm saying this: PRNGs tend towards large internal states. Hash functions tend to be optimized for small outputs. PRNGs are optimized for larger outputs. You should expect seeding using a hash function to be slower than seeding using a PRNG. If from_hashable
does not exist for either performance or quality then should we conclude that it's not meant for production code?
Ah. How then would you propose we condense arbitrary-length user input, potentially with a lot of bias (e.g. text), into a reasonably short key with low bias and good bit propagation (avalanche)? Hash functions seem like a good solution to me, just so long as we pick one capable of at least 128 (preferably 256) bits of output.
We could do something like split the user input into state-sized blocks and XOR together, calling a PRNG state-transition function between each XOR. But that's just inventing our own (non-crypto) hash function.
I found a way to move from_hashable
out of rand-core
, which I think is preferable for most people. It does unfortunately mean users have to use rand::FromHashable;
now, but I guess we can live with that.
Updated: rebased and squashed. I still have the old history locally, but don't see much value.
High time to get this merged :smile:. Moving from_hashable
out of rand-core
seems like a good idea!
How confident are you in the custom finalizers? And is there anything specific I should look at?
I would like it if the RNG tests would only depend on things in rand-core
. Especially if they are to be split out into another crate. Does it make sense to add a test_stdng_construction
test in rand
instead?
Actually, I think I will replace SeaHash with MetroHash; it's better known and reviewed and has similar performance, as well as native 128-bit output and 256-bits of state (IIRC), and a permutation function which should make it relatively easy to output any amount of state. But I didn't get that done yet.
If we go with 128-bit hashes, there is a lot more choice. MetroHash, CityHash, FarmHash, MurmurHash, SpookyHash, maybe more. I think that it is good to look at the performance, but that simplicity is more important. Maybe I am wrong, but it seems to me MetroHash, CityHash and FarmHash (and maybe xxHash) are part of a series of hashes that come and go. It would be nice if we could pick something that does not seem badly outdated over ~5 years. I have a little preference for Murmurhash3 there.
This repro contains a list of hashes with benchmarks: https://github.com/rurban/smhasher
128-bit MetroHash and MurmurHash3 seem about evenly matched for small inputs. I wonder what their performance on x86 is.
That repo benchmarks hash functions by throughput on reasonably large data sizes. We don't care much about that. It also doesn't say much about security (other than some stuff about hash tables which I don't even think is correct).
I agree that simplicity is fairly important; it seems most hash functions innards aren't that complex, but the input/output functions can get complex to handle multiple input sizes, optimise special cases, and some other functionality.
Hmm, this conversation is split between two threads?
That repo benchmarks hash functions by throughput on reasonably large data sizes.
If you click on the name of the hash, there are also the results for hashing 1, 2, 3, 4, etc. bytes. And the last column is useful for us, because that shows if it is statistically ok. We don't really have security considerations here, right?
Hmm, this conversation is split between two threads?
Sorry :smile: Seemed to fit there. Link for others that may read this: https://github.com/dhardy/rand/issues/18#issuecomment-354618223
Edit: PR no longer shows the code thanks to our rebasing of Rand history, but the commit is still available here.
Implement seeding via hash function, as discussed in #18.
This is a very rough implementation.
To discuss, regarding the hash function:
from_hashable
behind a feature gate (hash
orseahash
) until we're more comfortable with the idea — that allows stabilisingrand-core
without this feature, in theory. But hopefully we don't need to do this.AsBytesFixed
impls are needed, e.g. foru64
AsBytesFixed
is the best way of going about this. I'm considering moving the hashing into the trait implementation withtrait Hash<H> { fn hash(state: &mut H, val: Self); }
or similar. This allows specialised implementations of SeaHash.from_hashable
should be more general, e.g. consume a SeaHash struct.)Regarding
SeedableRng
:Seed
an associated type is a compromise;SeedSize
associated constant doesn't appear to work; we do however restrictSeed
to types supported byFinalize
Seed
au64
array instead ofu8
array is certainly more convenient for us; any reason we shouldn't do this?SeedableRng::from_seed
impls for Isaac aren't 100% backwards compatible (old version allowed much larger seeds).Lots to discuss, but I think this is moving forward. (@ticki if you get the time, I'd love your thoughts.)