Improve ISAAC performance (take 2)

pitdicker commented 6 years ago

This is a new try instead of https://github.com/dhardy/rand/pull/36. Now fills the output buffer in reverse, so fill_bytes keeps working exactly as before.

It was now possible to move code shared between IsaacRng and ChaChaRng to a separate rand_core::impl::fill_via_u*_chunks function. All the unsafe code is contained therein.

The trick with 32-bit indexing to make isaac64::next_u32 faster worked best with reading the results backwards. Now I had to think of something else, and it isn't pretty...

pitdicker commented 6 years ago

Sorry for all the pushes.

I temporarily merged https://github.com/dhardy/rand/pull/44, and big-endian tests are green (not on first try though).

dhardy commented 6 years ago

Can you try adding this test for Isaac 64? I calculated the numbers externally; hopefully I got it right.

    #[test]
    fn test_isaac64_true_mixed_values() {
        let seed: &[_] = &[1, 23, 456, 7890, 12345];
        let mut rng1 = Isaac64Rng::from_seed(seed);
        // Sequence is mostly to check correct interaction between 32- and
        // 64-bit value retrieval.
        assert_eq!(rng1.next_u64(), 547121783600835980);
        assert_eq!(rng1.next_u32(), 1058730652);
        assert_eq!(rng1.next_u64(), 3657128015525553718);
        assert_eq!(rng1.next_u64(), 11565188192941196660);
        assert_eq!(rng1.next_u32(), 288449107);
        assert_eq!(rng1.next_u32(), 646103879);
        assert_eq!(rng1.next_u64(), 18020149022502685743);
        assert_eq!(rng1.next_u32(), 3252674613);
        assert_eq!(rng1.next_u64(), 4469761996653280935);
    }

pitdicker commented 6 years ago

It took some time to see the bug... Sharp :-)

I have removed all the ugly indexing stuff, and added a half_used bool. It is a little slower for next_u32, and a little faster for next_u64 and fill_bytes.

It is curious to see how adding just one extra operation changes the benchmarks by 5~10%. Using an Option instead of indexing tricks is almost as slow as just truncating next_u64.

Before:

test gen_bytes_isaac    ... bench:   1,091,355 ns/iter (+/- 8,751) = 938 MB/s
test gen_u32_isaac      ... bench:       4,205 ns/iter (+/- 33) = 951 MB/s
test gen_u64_isaac      ... bench:       8,096 ns/iter (+/- 64) = 988 MB/s

test gen_bytes_isaac64  ... bench:     563,099 ns/iter (+/- 2,616) = 1818 MB/s
test gen_u32_isaac64    ... bench:       4,237 ns/iter (+/- 26) = 944 MB/s
test gen_u64_isaac64    ... bench:       4,245 ns/iter (+/- 29) = 1884 MB/s

test gen_bytes_chacha   ... bench:   2,551,234 ns/iter (+/- 20,947) = 401 MB/s
test gen_u32_chacha     ... bench:      11,476 ns/iter (+/- 819) = 348 MB/s
test gen_u64_chacha     ... bench:      21,432 ns/iter (+/- 320) = 373 MB/s

After:

test gen_bytes_isaac    ... bench:     713,938 ns/iter (+/- 4,371) = 1434 MB/s (+53%)
test gen_u32_isaac      ... bench:       4,078 ns/iter (+/- 19) = 980 MB/s (+3%)
test gen_u64_isaac      ... bench:       7,152 ns/iter (+/- 41) = 1118 MB/s (+13%)

test gen_bytes_isaac64  ... bench:     387,755 ns/iter (+/- 2,040) = 2640 MB/s (+45%)
test gen_u32_isaac64    ... bench:       3,106 ns/iter (+/- 23) = 1287 MB/s (+36%)
test gen_u64_isaac64    ... bench:       4,143 ns/iter (+/- 17) = 1930 MB/s (+2%)

test gen_bytes_chacha   ... bench:   2,567,562 ns/iter (+/- 65,782) = 398 MB/s
test gen_u32_chacha     ... bench:      11,469 ns/iter (+/- 141) = 348 MB/s
test gen_u64_chacha     ... bench:      21,466 ns/iter (+/- 544) = 372 MB/s

pitdicker commented 6 years ago

It seems our comments have crossed. I also hand-calculated the results. Is the one in the commit ok for you?

dhardy / rand

Improve ISAAC performance (take 2) #45