Reseeding perf - Githubissues

pitdicker commented 6 years ago

I wanted to try inverting the counter in ReseedingRng as discussed in https://github.com/dhardy/rand/issues/59, but it turned out the performance was pretty bad to begin with.

Benchmarks before:

test reseeding_xorshift_bytes ... bench:     559,635 ns/iter (+/- 5,160) = 1829 MB/s
test reseeding_xorshift_u32   ... bench:       4,265 ns/iter (+/- 27) = 937 MB/s
test reseeding_xorshift_u64   ... bench:       4,893 ns/iter (+/- 30) = 1634 MB/s

After:

test reseeding_xorshift_bytes ... bench:     562,940 ns/iter (+/- 980) = 1819 MB/s
test reseeding_xorshift_u32   ... bench:       2,323 ns/iter (+/- 9) = 1721 MB/s
test reseeding_xorshift_u64   ... bench:       2,918 ns/iter (+/- 3) = 2741 MB/s

And plain Xorshift for comparison:

test gen_bytes_xorshift       ... bench:     555,592 ns/iter (+/- 10,734) = 1843 MB/s
test gen_u32_xorshift         ... bench:       1,372 ns/iter (+/- 12) = 2915 MB/s
test gen_u64_xorshift         ... bench:       2,643 ns/iter (+/- 28) = 3026 MB/s

I don't like several parts of the current design of ReseedingRng, but that is for another issue.

pitdicker commented 6 years ago

Not sure what you mean about not liking the way it works though?

Some parts seem not very ergonomic. For example the new function takes an already existing RNG. Why does it need you to initialize the wrapped rng separately? If it knows how to reseed an rng, surely it can also initialize one?

The from_reseeder function does not make much sense to me. The main difference with new is that it comes with a default threshold. The DEFAULT_RESEEDING_THRESHOLD is very small and basically always wrong. The threshold value depends on the paranoia of the application/library, and on the wrapped rng, so I don't thing having a default is a good idea. Also it takes a fixed seed, instead of using the SeedFromRng trait.

And I didn't like the Reseeder trait. Why can't ReseedingRng just use some other Rng, without requiring a custom wrapper. That were my thoughts yesterday though, I may be coming around on the last issue...

pitdicker commented 6 years ago

I have been trying out a wild idea today to reduce the overhead of ThreadRng, which uses ReseedingRng. Because that seems like the commonly used interface, I don't really want it to look bad :-). In rand in the nursery its performance is only 50% of the RNG it wraps. After sprinkling some [inline]'s and removing some wrappers it gets to 80% of the RNG.

Is seems sensible to assume that if you want to reseed an RNG, it will probably be a cryptographic RNG, not some simple one. The variants I have seen so far all generate blocks of results, not one result at a time. One way to reduce the overhead of ReseedingRng is to do the reseed 'bookkeeping' only when a new block of results is generated, not with every next_*.

As a test I have added an RngCore trait:

pub trait RngCore: Sized {
    type Results: AsRef<[u32]>;

    fn init(seed: &[u32]) -> Self;
    fn generate(&mut self, results: &mut Self::Results);
    fn results_empty() -> Self::Results;
}

It just exposes the core algorithm form an RNG, and does not include the buffering etc necessary to implement the Rng trait. It does not really make the implementation of HC-128 less clean, and should also not be to hard too implement for ISAAC and ChaCha.

Because ReseedingRng now has access to the RNG's algorithm, it can only use that part and implement it's own buffering and bookkeeping. This should (not completely true for next_u32 yet) bring the overhead of ReseedingRng down to almost 0%, and bring the performance of ThreadRng within 90% of the wrapped RNG.

Do you think this a direction worth pursuing?

This is my current super ugly, many things comment out, WIP https://github.com/pitdicker/rand/commit/25dfbdddbc13cb3c59c865cdecd529f3d0784781

dhardy commented 6 years ago

This trait requires the implementation to use u32 internally though, right? Can that be made a parameter? Also, I don't much like using a &[u32] seed; maybe you can do something like:

pub trait<T> RngCore: SeedableRng {
    type Results: AsRef<[T]>;

    fn init(seed: &<Self as SeedableRng>::Seed) -> Self;

Not quite sure what I think right now; this would make ReseedingRng only usable for certain classes of RNGs, right? But maybe that's an advantage allowing reseed to combine both current state (or output buffer) with a fresh seed instead of simply replace with a new seed. You're right, the wrapper is not very useful for fast, weak PRNGs.

pitdicker commented 6 years ago

I was already a little proud I got the extra abstraction working, including the Asref slice trick to be generic over arrays. But what you write is much better!

But maybe that's an advantage allowing reseed to combine both current state (or output buffer) with a fresh seed instead of simply replace with a new seed.

It is more that it combines the step to fill a new output buffer with the bookkeeping when to reseed the rng. Checking and managing counters can take almost as much time as filling the output buffer with fresh values, so every little thing we have to do less each step helps.

Not quite sure what I think right now; this would make ReseedingRng only usable for certain classes of RNGs, right?

I think I named it ReseedingBlockRng? I think nothing prevent using the current ReseedingRng with all kinds of RNG's. But this should be a faster alternative for what are basically CryptoRng. But if reseeding other kinds of RNG's does not make much sense, we could remove ReseedingRng.

What do you think about introducing an RngCore trait? In some way I find it a bit ugly because the main motivation is just to increase the performance of a reseeding wrapper. But because of TreadRng this may be worthwhile.

On the other hand IsaacRng, ChaChaRng and Hc128Rng now all have an impl block with the core algorithm, and this trait would make that a bit more formal. Maybe it can even become possible to share all the extra code for impl Rng between these implementations.

dhardy commented 6 years ago

Maybe the trait should be named BlockRng instead? But I think this means CSPRNGs would have to implement:

Rng
CryptoRng
BlockRng
SeedableRng
SeedFromRng

Is that a few too many traits? I'm wondering whether Rng should be implemented automatically for every BlockRng (perhaps via trait extension with default impls). Actually, if init is provided, then SeedableRng and SeedFromRng could be implemented automatically too? I'm not totally sure this will work, but it may simplify CSPRNG implementations.

pitdicker commented 6 years ago

Maybe there are tricks with traits I don't know, but I couldn't see a way to do this automatically. Rng implementations would have a struct that includes the struct from RngCore, an output buffer and counter/index. But a macro could certainly do it.

pitdicker commented 6 years ago

It turns out AsRef<[T]> is not implemented for [u32; 256], so that doesn't work as a bound. @dhardy do you happen to have an idea on how to have a generic Results type, but still be able to do something useful like index into it or getting the length?

pitdicker commented 6 years ago

Found a terrible workaround: use a newtype with AsRef and Deref implementations:

#[derive(Copy, Clone)]
pub struct IsaacArray([u32; RAND_SIZE]);
impl ::core::convert::AsRef<[u32]> for IsaacArray {
    fn as_ref(&self) -> &[u32] {
        &self.0[..]
    }
}
impl ::core::ops::Deref for IsaacArray {
    type Target = [u32; RAND_SIZE];
    fn deref(&self) -> &Self::Target {
        &self.0
    }
}
impl ::core::ops::DerefMut for IsaacArray {
    fn deref_mut(&mut self) -> &mut [u32; RAND_SIZE] {
        &mut self.0
    }
}

dhardy commented 6 years ago

Sorry, no, but I guess this is another thing that will be fixed by constant generics eventually.

pitdicker commented 6 years ago

@dhardy I now have this mostly working, and am cleaning up the changes. A lot of code has to move around... Implementing RNG's with the BlockRng trait looks pretty clean now, here for example HC-128. The overhead of ReseedingBlockRng with this method is not really measurable, as hoped.

As trait I now have:

pub trait BlockRng<T>: Sized {
    type Results: AsRef<[T]> + Default;

    fn generate(&mut self, results: &mut Self::Results);
}

But there is one place I am stuck with the traits: https://github.com/pitdicker/rand/blob/blockrng_part2/rand_core/src/lib.rs#L168. I want to add a BlockRngWrapper to implement Rng for a BlockRng. And it needs a separate implementation for those that return [u32] and [u64].

The implementation for [u64] is commented out at the moment, otherwise I get the error:

error[E0119]: conflicting implementations of trait `Rng` for type `BlockRngWrapper<_, _>

Do you know how to fix this?

dhardy commented 6 years ago

Merry Christmas @pitdicker. I had a look, and I think the problem is that it would be possible for some R to implement both BlockRng<u32> and BlockRng<u64> since essentially they are separate traits. Perhaps it would be useful to prevent a type implementing both traits, but I'm not sure if that's possible.

It might be possible to use specialisation somehow by making BlockRng<u32> more general than BlockRng<u64> (i.e. the latter extends the former).

Alternatively you could just use two separate BlockRngWrapper traits.

BTW I think the names would be better like this:

BlockRng → BlockRngCore
ChaCha → ChaChaCore
BlockRngWrapper → BlockRng

pitdicker commented 6 years ago

Thank you! And I like the names you listed better.

essentially they are separate traits

Ah, that explains it. Back to the drawing board then. The trait system must be logical, I should really get a handle on it, but don't know of a good resource...

My hope was to end up with something like: ReseedingBlockRng > BlockRng > ReseedingCore > (ChaChaCore, Isaac64Core, etc.) Two traits would help with implementing RNG's, but not with the reseeding mechanism?

I think these changes are only worth it if the end results is reasonably clean, and am starting to give up. On the other hand the not directly perfect code for ReseedingBlockRng I have now is a win for anything except ISAAC-64.

dhardy commented 6 years ago

@pitdicker the logic you want is essentially this:

pub trait A {}  // BlockRng<u32>
pub trait B {}  // BlockRng<u64>
pub trait T {}  // BlockRngWrapper

impl<X: A> T for X {}
impl<X: B> T for X {}

The compiler won't allow the second impl because if an X were to implement both, that type would have two impls for T. I don't think it's possible to tell the compiler no type can implement both A and B. Alternatively it would be nice to say impl<X: B> T for X where not X: A {}, but I don't think that's possible either.

There's a workaround: specialization allows multiple implementations, so long as one is more specific than the other. But that's not stable yet. Example.

BTW feel free to bring this up on https://internals.rust-lang.org/ but I doubt there will be any rapid progress on it. You may also like N. Matsakis's blog, but it's a bit off-topic here.

Edit: found something related

dhardy commented 6 years ago

So @pitdicker should I merge this PR while the BlockRng thing is left on the side for now? If so you could open a tracking issue for that.

pitdicker commented 6 years ago

Yes, I think merging this is a good idea.

I am not really sure how to proceed with the BlockRng idea. What do you think about implementing it for 32-bit RNG's for now, with the possibility open to extending it to 64 bit once specialisation is ready?

I somewhere messed up this branch but will fix it in a moment.

dhardy commented 6 years ago

So for now 64-bit block RNGs would not use BlockRng? How then will ReseedingRng work on StdRng? I guess if we switch StdRng to HC-128 first that doesn't matter so much.

pitdicker commented 6 years ago

That is the idea :smile:. And else it can still work and be faster with the ReseedingRng from this PR.

Travis seems very busy today...

dhardy commented 6 years ago

Yes it is. I wonder if it's something to do with the new year?

pitdicker commented 6 years ago

O wow, maybe... But ready after all.

dhardy / rand

Reseeding perf #76