Closed nstilt1 closed 4 months ago
Alright, I think I've added just about all I can think of. The #[cfg_attr...
in lib.rs
and some impls in rng.rs
are there because rand/src/std.rs
has a
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct StdRng(ChaCha12Rng);
There are also some functions in rng.rs
that don't use #[inline]
. I'm not sure if they need it or not. Do you have any questions, suggestions, or feedback? Thanks for your time.
I'd suggest less unsafe code and more of a KISS implementation in an initial PR, and then circling back on some of it in a followup so it can be considered separately.
Alright. Switching the safety on
I was able to use generics to ensure that the method inputs impld Into<{WrapperName}>
, and then called .into()
internally. Now the inputs should zeroize on drop when the zeroize
feature is enabled, and end users will not need to use .into()
on the input parameters. I was also able to remove all of the *_bytes()
methods I had implemented and added impls for From<[u8; x]>
.
The only final issue I ran into was for a way to convert a [u8; 12]
to a [u8; 16]
(or a u128
to a [u8; 12]
) without producing an un-zeroizeable copy, so I added an impl for From<&mut [u8; 12]>
. It could alternatively be done with an additional wrapper for [u8; 12]
. Realistically, this impl is unnecessary, but I was using the set_stream_bytes()
method in a test and it was somewhat more convenient for the test to use a [u8; 12]
. The From<&mut [u8; 12]>
is both unncecessary and less efficient and could be removed if you want.
Are there any additional changes that need to be made?
I think that is all I want to change. There is some stuff in rng.rs
, particularly the Zeroizing structs that look a little bit... polluting. I could move them into a different file if needed, or it might be possible to use a generic struct or two. I probably won't be making any more changes until I receive feedback... although, I might add a test to make sure that the inputs actually get zeroized on drop. I'm also not sure if the methods with alternate inputs really need a comment about the performance—the difference is probably minimal, and it won't even affect the rate of the output.
Thanks for your time, for this wickedly fast ChaCha20 cipher implementation, as well as the other rand
and c2-chacha
authors and contributors.
I was able to fix the zeroize test, but the user would need to pass an &mut [u8; 32]
(or worse, (&mut [u8; 32]).into()
) for the original to be zeroized. It appears that we can only zeroize the copies we receive :/, but I feel like the way it is called now (which is the way it was before) is acceptable. Eg:
let mut seed: [u8; 32] = /* arbitrary value */;
let mut rng = ChaCha20Rng::from_seed(seed);
seed.zeroize();
Some questions that may need consideration:
rand_core
in lib.rs
?rand_core
was the only dependency for the random number generator (https://github.com/rust-random/rand/issues/872#issuecomment-575242181). Do we still want to reduce the dependencies for the RNG and/or bypass the write_keystream_blocks()
method used in fn generate()
, since the underlying code is almost entirely within chacha20
?AlteredState
trait, or the from_seed
method that I modified from new()
in lib.rs
, or some of the zeroizing code?Is there a better way to impl ZeroizeOnDrop
for inputs rather than with wrappers and duplicated spaghetti code to determine if zeroize is enabled? It would be somewhat convenient if it could be done with something like this, within the method
impl SeedableRng for $ChaChaXRng {
type Seed = [u8; 32];
#[inline]
fn from_seed(seed: Self::Seed) -> Self {
#[cfg(feature = "zeroize")]
seed.zeroize_on_drop();
let core = $ChaChaXCore::from_seed(seed);
Self {
rng: BlockRng::new(core),
}
}
}
It would just be a little neat if it could be done with a method call that made no wrappers, no copies, didn't change the value's type, and didn't require the variable to be mutable. It seems kind of impossible without some kind of change to Rust though. Might be easier to just make SeedableRng
allow a Seed
of type &[u8; 32]
rather than doing a bunch of under-the-hood zeroizing to make sure that it zeroizes.
I'll go ahead and supply the rationale for AlteredState
. If we were to use any unsafe
code to interpret [u8]
as a [u32]
, it could mostly (if not completely; depends on the implementation) be contained within AlteredState
. That is pretty much the only benefit, and it honestly doesn't have much of a purpose when there is only safe code. Since the methods of AlteredState
are only used once and there is only safe code, it wouldn't hurt to just copy over the methods to where they are called.
@nstilt1 you seem to have checked in a .DS_Store
file.
(You can configure an excludesfile
in ~/.gitconfig
to avoid this in the future)
Now that we have the fill_bytes
method at our fingertips, I will look at .apply_keystream()
and see if I can reduce some of the overhead that fill_bytes
has.
I am working on it in a separate branch. Will write some tests using the output of the current implementation.
Regarding the last commit, I noticed an "inconsistency" with the index being used by set_word_pos()
being 4 bits when the full index is 6 bits. I attempted to "fix" that, but then I realized that it was being done that way for alignment purposes. Will try to align it again with minimal changes.
Something I probably should've mentioned much earlier, but it would be helpful if the implementation didn't use the cipher
API so that cipher
could be made an optional dependency (again, as it was in chacha20
v0.8)
That would greatly reduce the number of crates needed for the rand_core
/rand_chacha
-replacement use case.
Something I probably should've mentioned much earlier, but it would be helpful if the implementation didn't use the
cipher
API so thatcipher
could be made an optional dependency (again, as it was inchacha20
v0.8)That would greatly reduce the number of crates needed for the
rand_core
/rand_chacha
-replacement use case.
I can give that a try, and I'll take a gander at v0.8. I'll let you know if I run into any problems.
@tarcieri
Do you want the code for the ChaCha cipher to be pretty much exactly the same as it was in v0.8.1
, including the code in src/backend/
and src/backend.rs
? I just don't want to un-fix anything that may have been fixed in the current version, and if you don't want it completely reverted, would we still need to add src/backends/autodetect.rs
and to have the ChaCha Core
struct in there?
Alternatively, the process_with_backend()
method might be able to be converted to accept inputs that don't depend on cipher
, or possibly with no inputs and it could just be a generate method that calls gen_par_ks_blocks()
where possible, and it might not be desirable if there are use-cases where people wouldn't want it to generate the blocks in parallel. Also, as for the 32 vs 64 bit counter, it would be easier to have that functionality by reverting instead of reworking process_with_backend()
since the previous version already supports it.
In the meantime, I will be transforming process_with_backend()
to a method called generate()
that overwrites the 256-byte ChaChaCore
buffer using the backend, then moving some things around based on v0.8.1 to separate the dependent code. Let me know if there is something else you would prefer.
Here's a link to v0.8.1 in case you don't have the link handy: https://github.com/RustCrypto/stream-ciphers/blob/338c078d731692fba3b8256e45de2c3e334d46d8/chacha20/src/backend/autodetect.rs
Okay... so I was able to get the RNG to work using generate()
, and I have 2 different versions in different branches. The optional_cipher
branch fills a [u32; 64], and the unsafe_fill_bytes
branch passes a pointer to generate()
so it can directly write the results to the input slices of fill_bytes()
. Both of these have about the same performance at about 0.994-1.001 cpb, so either one could be used for the RNG. The primary benefit of the pointer is that it works on a [u8; 256]
as well as a [u32; 64]
and it works on a &[u8]
. And on paper, it should be faster for fill_bytes()
... but it doesn't appear to be.
The drawback of both of these generate methods is that without the process_with_backend()
method, all of the StreamCipher
, StreamCipherCoreWrapper
, StreamCipherSeek
, etc methods would need to be reimplemented using a macro and the original performance of the cipher on avx2 was 0.93-0.96 cpb, while the current performance is about 1.01-1.2 cpb.
I think a valid solution for the cipher's performance would be to conditionally compile the process_with_backend()
and generate()
methods, along with the backend methods it uses so that it will hopefully be the same as before. Will give it a try, but it might look a bit messier than just having one method for both implementations to generate values. Will attempt to use macros so that each method is only written once in src/backends/
Edit: It does not seem to be possible to write a macro that can make a function with varying arguments, as well as with a varying block of code in the function. Will just conditionally compile the inner
function and their associated methods for now.
Besides the currently failing checks, the main issues remaining are:
StreamCipherCore
's remaining_blocks()
method returns an Option<usize>
. usize
is 4 bytes on 32-bit machines and with a 64-bit counter, this method could fail to produce an accurate value, or it could throw an error, or maybe it just won't compile. I don't knowingly use a 32-bit computer, so I don't know exactly what would happen.~
stream_core.rs
simply states to return None
when the value is too large to fit in a usize
lib.rs
regarding the cipher-dependent code. I think generate
and process_with_backend
should probably remain in lib.rs
due to the use of the cpuid types. I'm not entirely sure how to use those outside of lib.rs where they seem to be defined. The Rounds structs/trait could also be moved into its own file like it was previously.Something that could be done regarding alignment on 64-bit machines:
[u32; 64]
buffer for 32-bit machines and a [u64; 32]
buffer for 64-bit machines for the RNG. I don't know if that would help with performance any, especially if we were to use a pointer for generate()
. But I feel like this might require minimal changes. I can take a look once all of the aforementioned issues are addressed or accepted.~
fill_bytes()
skips the buffer for most of the fill.~Something that could be interesting (although not necessary), would be a way for users to define their own ChaCha variants using exported macros, such as ChaCha22Rng, or perhaps using combinations with Legacy and IETF in the macro call. It wouldn't take much, although there would need to be a unique name input for the Rounds struct when it is defined since it does not appear to be possible to concatenate names for structs in a macro. We could also gate the macros behind a feature.~
XChaCha since it uses an 8-byte Nonce internally
XChaCha is based on the IETF construction internally, unless you're referring to a different XChaCha.
Something that could be interesting (although not necessary), would be a way for users to define their own ChaCha variants using exported macros, such as ChaCha22Rng
There are already far too many variants of ChaCha. We don't need any more.
ChaCha20 already used too much diffusion and is needlessly slow because of it. It would've been better standardized as ChaCha12. We certainly don't need any variants that run slower while providing no additional practical security value.
XChaCha since it uses an 8-byte Nonce internally
XChaCha is based on the IETF construction internally, unless you're referring to a different XChaCha.
In variants.rs
, I have a Variant
trait that specifies the starting index of the nonce that is fed into ChaChaCore<R,V>::state: [u32; 16]
. Currently, the starting index of the nonce is 14 for XChaChaVariant
, while Ietf
uses 13 as the start index. XChaCha
passes the original tests with the start index at 14, fails tests at index 13, and the Ietf
variant passes tests using 13 as the start index. There could be something wrong with the tests, and I also noticed that one of the test vector links for XChaCha returns a 404, and it is not indexed by the Wayback Machine. Some further evidence involves the XChaCha initialization code which uses XChaChaNonce[16..] of a 24-byte array for the nonce that it passes to ChaChaCore<R,V>.
https://github.com/RustCrypto/stream-ciphers/blob/3a53d4172a4aa6cf05cfbe455ea851d8072f05d9/chacha20/tests/mod.rs#L97C9-L97C87 The direct link in question: https://tools.ietf.org/id/draft-arciszewski-xchacha-03.html#rfc.appendix.A.3.2
I just think that if XChaCha were to use the IETF construction, it would pass a 12-byte nonce to ChaChaCore<R,V>::state[13..16]
instead of an 8-byte nonce to ...[14..16]
. And this isn't exactly a problem, but it is a minor discrepancy.
I will double check this other link for the paper, but I briefly looked at it and it does specify that it uses the IETF variant with a 96-bit nonce rather than the 64-bit nonce. It should have some test vectors to compare with. https://datatracker.ietf.org/doc/html/draft-arciszewski-xchacha-03
Here's a working link:
https://datatracker.ietf.org/doc/html/draft-arciszewski-xchacha-03
Note that we're building upon uses the IETF's ChaCha20 (96-bit nonce), not Bernstein's ChaCha20 (64-bit nonce).
I found the documentation that causes this. I'm sure you were already aware of this lol, but I'm not entirely sure why it doesn't use the full 12 bytes by extending the XChaCha nonce to 28 bytes (or something). Perhaps it came from XSalsa, and maybe to be compatible with a 64-bit counter... but I don't know why it uses a "96-bit nonce" if 32 of those bits are 0s. I don't know. It seems to be fairly secure the way it is, but those bits could be prime bits of real estate for extending the nonce just a little bit more.
[2](https://datatracker.ietf.org/doc/html/draft-arciszewski-xchacha-03#section-2). AEAD_XChaCha20_Poly1305
XChaCha20-Poly1305 is a variant of the ChaCha20-Poly1305 AEAD
construction as defined in [[RFC7539](https://datatracker.ietf.org/doc/html/rfc7539)] that uses a 192-bit nonce
instead of a 96-bit nonce.
The algorithm for XChaCha20-Poly1305 is as follows:
1. Calculate a subkey from the first 16 bytes of the nonce and the
key, using HChaCha20 ([Section 2.2](https://datatracker.ietf.org/doc/html/draft-arciszewski-xchacha-03#section-2.2)).
2. Use the subkey and remaining 8 bytes of the nonce (prefixed with
4 NUL bytes) with AEAD_CHACHA20_POLY1305 from [[RFC7539](https://datatracker.ietf.org/doc/html/rfc7539)] as
normal. The definition for XChaCha20 is given in [Section 2.3](https://datatracker.ietf.org/doc/html/draft-arciszewski-xchacha-03#section-2.3).
I'll see about getting the legacy counter back up to 64 bits, and I'll make a pull request for cipher
afterwards to change the remaining_blocks()
return type to an Option<u64>
if that is acceptable. I don't know if any stream ciphers could have more than 2^64 remaining blocks, but that is the current 64-bit limit of usize
. There are 131 reverse dependencies for cipher
, so if changing that return type isn't acceptable, it might also be possible to conditionally compile the counter type based on 32-bit or 64-bit target_pointer_width. Having a 32-bit counter on 32-bit systems would still behave differently though, which would probably be unacceptable if we wanted to offer consistency/compatibility across different platforms.
Regarding the last commit... I wasn't trying to be salty or anything... I'm a little OCD and my code was technically incorrect. I don't mind which version of XChaChaVariant that the library uses, but if my own code (with the nonce start index at 14) confused me, it could confuse the next new contributor. I'm okay with either version, but I suppose a comment could have sufficed.
Also, I recant the 32-bit usize
issue. Didn't see the comment in stream_core.rs
about it.
I've begun working on the counter in my counter_support
branch, and I'm having some issues. I don't have much experience with SIMD, but I have absolutely 0 experience with NEON and I don't see any tooltips in VS Code for it. If I'm still stuck on this after another week or so, then I might need some help. Or if the 64-bit counter isn't a priority, that is okay with me as well.
It would probably be good to split the changes to return the rand_core
feature and the changes to support a 64-bit counter into separate PRs.
(I should also note the 64-bit counter is something of a niche use case where it would be nice to avoid undue complexity elsewhere)
Okay. Some of the checks that are failing seem to have started failing after I split up the library into the various features, and I don't know what to do about them, or if the current features are the ones that you want present. I am unsure how to proceed with those failing checks, particularly cargo build --target thumbv7em-none-eabi
Also, I've still got to replace the impl_zeroize_on_drop
and impl_zeroize_from
macros from rng.rs
since they are only used once.
Looks like a doctest failure. Surprised minimal-versions
is the only thing catching it
That last "fix" got it to pass on my end, but should those docs just go in chacha.rs
? I am unfamiliar with the inner workings of rust docs.
3f82000 is fine though we usually use a slightly different pattern for that: https://github.com/RustCrypto/formats/blob/961e756/pem-rfc7468/src/lib.rs#L23-L24
I believe it is mostly ready. Some possible things that are missing:
pub type ChaCha20LegacyCore = ChaChaCore<R20, Legacy>;
XChaCha8Core
, XChaCha12Core
, XChaCha20Core
. Or we could expose the Rounds structs as well as the variants.rs
structscriterion-cycles-per-byte
is supposed to work on aarch64 with Linux, so I can boot up my Raspberry Pi 4 and try to bench it later today. Since the benchmarks will be different on aarch64-linux and aarch64-macOS, I'll see about putting them in
benches/
bench_results/
aarch64-linux/
aarch64-macOS/
x86_64-linux/
Let me know if any other changes need to be made.
That should be all of the instructions for getting cpb counts on a Raspberry Pi 4b.
I still don't know why fill_bytes()
is slower than apply_keystream()
on x86_64. It uses basically the same underlying code, where apply_keystream()
does some extra steps. I've tried using a [u32; 64]
instead of *mut u8
, but the performance was roughly the same at about 0.99 cpb. I also benched its branch again just now to be sure, and it currently runs at 1.05 cpb.
I've just removed the duplicated gen_par_ks_blocks()
in avx2.rs
/neon.rs
/sse2.rs
so that those methods now just pass a pointer to the rng's pointer version. Somehow, it was a hair faster on macOS. I figure it would be easier to maintain if there was as little duplicated code as possible. The only remaining duplicated code in those files would be the inner
and rng_inner
methods. But they seem small enough to be okay.
Classic big endian vs little endian error :(
Not sure how the rng tests passed, unless they need to be in chacha20/tests
There is now minimal duplicated code in the individual backend files. One thing I noticed regarding generate()
though, is that none of the pointer methods in the backends guarantees that it is writing to a destination with 256 bytes to spare, that it is up to the programmer to determine if there are 256 bytes to spare before calling the method. The soft.rs function's inner code was also completely within an unsafe block. My questions regarding generate()
:
1) Should soft.rs
's rng_gen_ks_blocks()
and write_ks_block
be unsafe fn
s?
2) Should generate()
be an unsafe fn
if all of the branches were to lead to unsafe
function calls?
3) is the first assert necessary:
assert!(!dest_ptr.is_null());
This is my first time using pointers in Rust, but if the pointers are all created internally without any code that explicitly creates a null pointer, then it might not be possible for a null pointer to get passed to generate()
. Chat GPT suggested it, but I've noticed that Chat GPT consistently suggests unnecessary and unoptimized code.
I suppose I could mark it unsafe and add safety notes similar to cipher
's safety notes. Will update the comment in variants.rs
as well, but the variants.rs
file will be more important once the 64-bit counter has been added. After that, I don't think I have anything left to change until I receive feedback. Feel free to take it from here. Just trying to minimize the remaining work to be done
hmm... looks like commits with backticks omit the text between them. I removed a nasty looking bit op that discarded the last 8 bits since I literally bit shifted right by 8 immediately afterwards. But... I mean, bit operations aren't exactly necessary here, but it just so happens that every number in this library is a power of 2... it's kind of tempting.
looks like commits with backticks omit the text between them
That's likely your shell evaluating the contents of the backticks
I've seen this mentioned in multiple issues: https://github.com/rust-random/rand/issues/1261#issuecomment-1341891257
The current design of BlockRngCore (and the proposed generic_const_exprs-based version) is not ideal. One issue for which I don't have a good solution is variability of block size dependent on runtime target feature detection. For example, ChaCha has different optimal block sizes dependent on whether hardware has SIMD extensions. Right now we simply use the largest supported block size, which is a bit suboptimal.
I saw some previous versions of this library with each backend having its own Core. An alternative solution that would (almost) be a cover-all, would be to optimize fill_bytes()
a little bit further by using a [u32; 16]
buffer to minimize how much needs to be written to the buffer, how much needs to be copied from the buffer at the beginning of fill_bytes()
, as well as how much would need to be generated for the buffer with the tail. Then instead of
// Calculate how many 256-byte chunks are remaining to write to.
let num_chunks = (dest_len - dest_pos) >> 8;
unsafe {
let mut chunk_ptr = dest.as_mut_ptr();
chunk_ptr = chunk_ptr.add(dest_pos);
for _chunk in 0..num_chunks {
self.core.generate(chunk_ptr);
chunk_ptr = chunk_ptr.add(256);
}
}
We could delegate this piece of fill_bytes()
to the backend.
// Calculate how many 64-byte blocks are remaining to write to.
let num_blocks = (dest_len - dest_pos) >> 6;
unsafe {
let mut write_ptr = dest.as_mut_ptr();
write_ptr = chunk_ptr.add(dest_pos);
self.core.generate(write_ptr, num_blocks, self.buffer.0.as_mut_ptr() as *mut u8);
}
Then in the backend, it would need to determine how many times it needs to run to write num_blocks
blocks to the pointer. For the backends that can generate 4 blocks in parallel, it seems that we'll need to be a little careful with the counter since both gen_ks_block()
and gen_par_ks_blocks()
both seem to generate 4 blocks.
Benefits of this approach:
fill_bytes()
would be a little bit more optimized for both SIMD and non-SIMD machinesstream-ciphers/benches
The main downsides of this approach are:
fill_bytes()
, not next_u32()
, etcgen_ks_block()
and gen_par_ks_blocks()
seem to generate 4 blocks every time they run, making this slightly trickyIt could be possible to also pass a pointer to the buffer as well. Maybe a signature like:
/// if dest_ptr is null, or if num_blocks is 0, it could only fill the buffer
unsafe fn generate(&mut self, dest_ptr: *mut u8, num_blocks: usize, buffer_ptr: *mut u8)
There are definitely some logistics to work out, but it might work.
I suppose I could bring that issue up in rand
. In the small_buffer
branch I modified write_par_ks_blocks()
to take num_blocks
(where num_blocks <= 4) as an input, and some macros that can extract either 1 or 2 blocks to minimize code duplication.
I think this might bring some undue complexity though. Will look into trying variable buffer sizes before I commit to that. That would have required some significant changes to backends
With this improvement
fill_bytes()
is about as fast as apply_keystream()
, if not a little bit fasterI believe the performance boost for avx2 is because backend::new()
is only called once when rng_inner
is called, and while it is running, it writes num_blocks
64-byte blocks to the dest_ptr instead of just writing 4 blocks to the dest_ptr.
Sorry for breaking away from KISS... I just noticed that there was a lot of duplicate code, and I was only able to remove it once I realized how pointers can be used. Would you like me to switch to a commit before I switched to pointers, or is it okay to try to fix the remaining broken stuff with the rng? The main remaining issue that I notice is to use cpufeatures
for the BUFFER_SIZE
instead of the cfg statements I added, and possibly using cpufeatures
for generate
in rng.rs
if it should go in rng.rs
, but I don't know how cpufeatures
needs to be handled.
I went ahead and put master on the pointers
branch and speed-ran a kiss implementation branch that behaves itself so far. It might fail a few checks at first, but it is simple, and I included most of the changes that I have recently made, such as some variable name changes, comments, and the LegacyCore and XChaChaCore. I tend to get carried away with things.
Will try to resolve failing checks later tonight, but I've got a backup branch.
Seems like you have some tough challenges regarding being able to abstract between cipher
and rand_core
. It's also a large PR which makes it hard to review.
To ensure the soundness of the unsafe code, it might be good to run some static analyzers on it, e.g. cargo miri
and/or Rudra.
Running cargo miri test --features cipher,legacy,xchacha,rand_core
with --cfg chacha20_force_soft
$ cargo miri test --features cipher,legacy,xchacha,rand_core
Preparing a sysroot for Miri (target: x86_64-unknown-linux-gnu)... done
warning: some crates are on edition 2021 which defaults to `resolver = "2"`, but virtual workspaces default to `resolver = "1"`
note: to keep the current resolver, specify `workspace.resolver = "1"` in the workspace root's manifest
note: to use the edition 2021 resolver, specify `workspace.resolver = "2"` in the workspace root's manifest
Compiling libc v0.2.149
Compiling getrandom v0.2.10
Compiling rand_core v0.6.4
Compiling rand_chacha v0.3.1
Compiling chacha20 v0.9.1 (.../stream-ciphers-rng/chacha20)
Finished test [optimized + debuginfo] target(s) in 0.89s
Running unittests src/lib.rs (.../stream-ciphers-rng/target/miri/x86_64-unknown-linux-gnu/debug/deps/chacha20-40bb95b3512b497d)
warning: Miri does not support optimizations. If you have enabled optimizations by selecting a Cargo profile (such as --release) which changes other profile settings such as whether debug assertions and overflow checks are enabled, those settings are still applied.
running 16 tests
test rng::tests::test_chacha_clone_streams ... ok
test rng::tests::test_chacha_construction ... ok
test rng::tests::test_chacha_multiple_blocks ... ok
test rng::tests::test_chacha_nonce ... ok
test rng::tests::test_chacha_true_bytes ... ok
test rng::tests::test_chacha_true_values_a ... ok
test rng::tests::test_chacha_true_values_b ... ok
test rng::tests::test_chacha_true_values_c ... ok
test rng::tests::test_chacha_word_pos_wrap_exact ... ok
test rng::tests::test_chacha_word_pos_wrap_excess ... ok
test rng::tests::test_chacha_word_pos_zero ... ok
test rng::tests::test_rng_output ... ok
test rng::tests::test_set_and_get_equivalence ... ok
test rng::tests::test_set_word_pos ... ok
test rng::tests::test_wrapping_add ... ok
test xchacha::hchacha20_tests::test_vector ... ok
test result: ok. 16 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Running tests/mod.rs (/stream-ciphers-rng/target/miri/x86_64-unknown-linux-gnu/debug/deps/mod-5e9658f96257a96c)
warning: Miri does not support optimizations. If you have enabled optimizations by selecting a Cargo profile (such as --release) which changes other profile settings such as whether debug assertions and overflow checks are enabled, those settings are still applied.
running 12 tests
test chacha20_core ... ok
test chacha20_seek ... ok
test chacha20legacy_seek ... ok
test chacha20test::chacha20_encryption ... ok
test chacha20test::chacha20_keystream ... ok
test legacy::chacha20_legacy_core ... ok
test legacy::chacha20_legacy_seek ... ok
test legacy::chacha20_offsets ... ignored
test xchacha20::xchacha20_encryption ... ok
test xchacha20::xchacha20_keystream ... ok
test xchacha20::xchacha20_seek ... ok
test xchacha20_seek ... ok
test result: ok. 11 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out
Doc-tests chacha20
running 4 tests
test chacha20/src/lib.rs - (line 44) ... ok
test chacha20/src/rng.rs - rng::ChaCha20Rng (line 542) ... ok
test chacha20/src/rng.rs - rng::ChaCha8Rng (line 538) ... ok
test chacha20/src/rng.rs - rng::ChaCha12Rng (line 540) ... ok
test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.11s
Running cargo miri test --features cipher,legacy,xchacha,rand_core
with --cfg chacha20_force_avx2 -C target-feature=+avx2
.
Returns an error due to using avx2. Miri doesn't seem to support avx2.
Running cargo miri test --features cipher,legacy,xchacha,rand_core
with --cfg chacha20_force_sse2 -C target-feature=+sse2
.
Returns an error due to using sse2. Miri doesn't seem to support sse2.
I am currently working on setting up Rudra. Shouldn't be much longer before I've got results.
Rudra results are in. Not sure if I need to run it with each rustflag, but they were all the same.
chacha20$ cargo rudra --features cipher,legacy,rand_core,xchacha,zeroize
2023-12-23 17:04:13.084630 |INFO | [rudra-progress] Running cargo rudra
2023-12-23 17:04:13.390622 |INFO | [rudra-progress] Running rudra for target lib:chacha20
2023-12-23 17:04:14.158353 |INFO | [rudra-progress] Rudra started
2023-12-23 17:04:14.158504 |INFO | [rudra-progress] SendSyncVariance analysis started
2023-12-23 17:04:14.158519 |INFO | [rudra-progress] SendSyncVariance analysis finished
2023-12-23 17:04:14.158522 |INFO | [rudra-progress] UnsafeDataflow analysis started
2023-12-23 17:04:14.170619 |INFO | [rudra-progress] UnsafeDataflow analysis finished
2023-12-23 17:04:14.170650 |INFO | [rudra-progress] Rudra finished
2023-12-23 17:04:14.399805 |WARN | [cargo_rudra] Target test:mod is not supported
2023-12-23 17:04:14.399829 |WARN | [cargo_rudra] Target bench:mod is not supported
2023-12-23 17:04:14.399834 |INFO | [rudra-progress] cargo rudra finished
FYI, #338 bumps the crate to v0.10.0-pre, but also includes some other breaking API changes.
Rebasing on it shouldn't be too hard, I hope!
I've made set_block_pos
and get_block_pos
as per some feedback in a recent rand issue #1369. I feel like those methods should be on the RNG Core rather than the RNG, but if it were to live on the RNG Core, then the RNG's index won't be changed or reset when the block_pos is set. That shouldn't be an issue as long as it is documented, but if the index should be reset, then the method might need to belong to the RNG.
Some questions I have:
1) Should the $ChaChaXRngCore
be a wrapper around ChaChaCore
to prevent cipher
methods from being available to the end user?
2) Should the public set_block_pos
and get_block_pos
methods be on the RNG and reset the index, or on the core and not reset the index?
I've now got a working version where the backends seem to be safely zeroizing when the ChaChaCore
is dropped, but it depends on the PR for zeroize
.
@nstilt1 can you rebase? This is looking mostly good and I'd like to get it merged.
Sure thing. I apologize for the delay. I uh... I was mis-using ChaCha in several ways. I should have it rebased by tomorrow.
*block_pos. I moved set/get_block_pos to the RNG rather than its core, where it wasn't able to change the RNG's index after trying to change the block pos
This looks very close now, though I'm not sure what's up with rust-toolchain.toml.save
or the empty and seemingly unrelated and empty salsa20/benches/mod.rs
I borrowed some code from version
0.8.1
ofchacha20
, as well asrand_chacha
and was able to get it to compile and pass tests. The main issues in my code that I identified were:[u8; 12]
instead ofu128
for set/get_stream(). It used to have au64
when it had a 64 bit counterset_stream()
takes the lower 96 bits of a u128 input. It now has support for using a[u8; 12]
withset/get_stream_bytes()
ParBlock<LesserBlock>
as a temporary output buffer prior to copying it intoresults
infn generate()
. It will probably leave a copy of itself in memory after use. It might be possible to change theBlockRngResults
type toParBlock<LesserBlock>
AsRef<[u32]> for BlockRngResults
andAsMut<[u32]> for BlockRngResults
set_word_pos()
with an input that wasn't divisible by 16,get_word_pos()
would round the input down to the nearest multiple of 16.rand_chacha
.I've also added
ZeroizeOnDrop
for theBlockRngResults
.https://github.com/rust-random/rand/issues/934