dhardy / rand

http://doc.rust-lang.org/rand
Other
2 stars 2 forks source link

To provide (or not) specific PRNGs? #58

Open arthurprs opened 7 years ago

arthurprs commented 7 years ago

On https://github.com/rust-lang/rfcs/pull/2106#issuecomment-346610377 I argued that rand should not provide any specific/named rngs (so no ChaChaRng, XoroShiro, etc.) but instead provide a very small set of generators (OsRng, StdRng, etc.) in addition to the trait machinery, etc..

Consider just WeakRng for example (StdRng might be enough moving forward), there were almost half a dozen proposals to improve it. One can argue that the initial choice wasn't the best but it's bound to happen again no matter the upfront effort.

In the event that rand keeps providing specific Rngs, proposals with merits will eventually come and the crate will be tied to the older variant until the next major version (w/ a higher friction upgrade). Until that happens the maintainers have a few choices:

  1. Reject the PR (for reason X, even if its a win) and ask the author to push it to crates.io instead
  2. Accept the PR adding generator G to rand
  3. Accept the PR adding generator G to rand and mark the other(s) deprecated

Note that 2 and 3 are likely duplicating something from crates.io, but with a blessing.

Advantages

Disadvantages

Non concerns

dhardy commented 7 years ago

On the whole I agree, but I think in this case it would be worth providing official companion crate(s) with named implementations (e.g. per RNG or family like rand-chacha or collections like rand-small-rng). This is mostly about quality assurance (code review and trust that updates are unlikely to cause major breakage).

burdges commented 7 years ago

I'm old fashioned but imho if an implementation gets used, then it should be usable to avoid increasing code size. I've voiced doubts about the crate separation but maybe rand-chacha, rand-isaac, etc. could all exist in the rand repository and depend on rand-core, and rand depends on all these crates.

arthurprs commented 7 years ago

If all backing implementations get their own blessed crate it has the advantages of both approaches with the minor drawback of having more crates to maintain (they will require little to no maintenance though).

pitdicker commented 6 years ago

After diving into small RNG's there turn out to be many different kinds. I only explored a couple of them, that seem to be the current 'generation'. Not many have the combination of good period, state size, statistical quality, and performance you would expect for a generally usable RNG.

For users it may be hard to pick the right one without some guidance. Especially what was common wisdom 20 years ago is far from the best choice today.

For cryptographic RNG's the situation is similar, although I have not dived all that deep into that. I have found quite a few RNG's where some people, and sometimes even the author, claim it is cryptographically secure. But they are unable to supply some sort of proof, because they don't know the tricks people working in cryptography use. And then there are cryptographic RNG's that were consider good, but are not really trusted anymore.

So it seems important to have some sort of guidance. I think one crate collecting good RNG's could provide that.

And this is not nice of me to say, but I am not all that impressed by the quality of several implementations on crates.io... That is because of all the discussion we have had here around periods, initialization, bad seeds, error handling, traits to implement (or not to implement) etc. and what I expect around documentation.

A third reason why a seperate crate that focuses on RNG's can be an advantage, is that it can explore some options to confirm/improve quality. Like:

So I see an advantage in one crate containing multiple good RNG's, outside of rand. How I currently see what such a crate could include:

good statistical quality, predictable for adversaries

good statistical quality, unpredictable (CryptoRng)

And maybe other categories of interest:

entropy source

huge period RNG's

low(er) quality but very fast or very small

rand would keep the wrappers StdRng and WeakRng to select a standard CryptoRng and small, fast RNG. But I don't like the names, especially I would like to see WeakRng be renamed to SmallRng.

Some alternative approaches, and why I think they would work less good:

Include all RNG's in rand @arthurprs argued against this well.

Have one crate per RNG, or multiple collection crates You still have the guidance / documentation problem.

dhardy commented 6 years ago

Sounds like good advice @pitdicker. It leaves two related questions:

  1. If SmallRng and CryptoRng are to be exposed in rand, should rand depend on the rand-rngs crate (or whatever it's called) or copy the code?
  2. Do users have to depend on the rand-rngs crate directly if they want a reproducible RNG (something which won't be changed in the future)? (Probably yes.)

Also, I personally think the external RNGs OsRng and JitterRng should either stay in rand, or move to another crate (e.g. rand-ext or rand-entropy) depending on rand-core only. I've already seen argument that they stay in rand unless there's really a demand for them to move.

pitdicker commented 6 years ago

For (1) I don't think it matters much. Even when the code is copied, with a little discipline it should not be hard to keep in sync. But if we make sure a rand-rngs crate does not use extra dependencies, I don't see a problem in depending on it. Seems cleaner to me. For (2) I would say yes. That is the promise of the original post.

Also, I personally think the external RNGs OsRng and JitterRng should either stay in rand, or move to another crate (e.g. rand-ext or rand-entropy) depending on rand-core only.

Yes, they don't really fall under this discussion. We are not going to swap OsRng out for something else I think :smile:. Okay, creating a crate with purely PRNG's seems better to me.

pitdicker commented 6 years ago

@dhardy Can we set up a plan or something? cc LinearZoetrope

Currently we have 5 things that touch RNG's going on:

If we have a separate crate, it seems to me it could work both on Rand 0.3 and this repository with a feature flag. Does this order make sense?

I don't really have time do do much for the next week though.

pitdicker commented 6 years ago

And what would be a good place for the repository? Also a sub-directory here, like rand-core?

dhardy commented 6 years ago

I understood BlockRng has nothing to do with small PRNGs since it's only for some CSPRNGs which make a whole block of numbers at once?

When you think we're ready I suggest opening a new PR against upstream to add a new small RNG (as something like SmallRng) and deprecate whatever will go (XorShiftRng and weak_rng I suppose). I think better not to make rand depend on your crate now (we can still discuss this later).

The small RNGs crate: I think the best strategy would be to add it inside the rand repo as something like rand-small and publish from there. Is there much point making it compatible with rand 0.3? I don't know personally. We'll probably have 0.5 soon...

209 I was about to look at; probably we can merge now.

@LinearZoetrope's #189 can be rebased after that. If you still want to discuss which PRNGs should be serializable you can add more comments there or open an issue somewhere, but it's mostly independent (other than knowing what else may change).

pitdicker commented 6 years ago

I understood BlockRng has nothing to do with small PRNGs since it's only for some CSPRNGs which make a whole block of numbers at once?

Yes. But it will change serialisation of ISAAC (and if we add it ChaCha)

I think better not to make rand depend on your crate now (we can still discuss this later).

Ok. Depending on a crate or copying the code is just an implementation detail when we only expose wrappers.

The small RNGs crate: I think the best strategy would be to add it inside the rand repo as something like rand-small and publish from there.

I don't know if we are completely speaking about the same thing. The small-rngs crate I made is something I wouldn't mind to throw away after https://github.com/dhardy/rand/issues/60 was done, except that it is useful to see the code of the tested RNG's.

When splitting out RNG's as discussed here I would start fresh and only copy a few good ones from there and from rand.

Is there much point making it compatible with rand 0.3? I don't know personally. We'll probably have 0.5 soon...

If we want 0.3 to be the stable branch for a while, than I think it makes sense so support it, with something like a rand_0.3_compatibility feature. I don't think it is much work.

209 I was about to look at; probably we can merge now.

It looks good and you put quite some effort into it. But would it not also be duplicate work? It is breaking backward compatibility (seems good and necessary), but that would break again when we only expose wrapper types.

@LinearZoetrope's #189 can be rebased after that. If you still want to discuss which PRNGs should be serializable you can add more comments there or open an issue somewhere, but it's mostly independent (other than knowing what else may change).

It would mean adding serialisation to rand, only to later remove it when it is no longer possible to use specific RNG's because only wrappers are exposed.

dhardy commented 6 years ago

When splitting out RNG's as discussed here I would start fresh and only copy a few good ones from there and from rand.

Okay, that's fine. (I'm not sure about including outdated classics like Mersenne Twister; possibly unnecessary since that already has its own crate.)

If we want 0.3 to be the stable branch for a while

Do we? I mean, 0.4.1 is a stable release (as much as pre-1.0 releases ever are); breaking changes must wait for 0.5 now. And where will we be when this rand-small crate is ready anyway?

It is breaking backward compatibility (seems good and necessary), but that would break again when we only expose wrapper types.

I've tried to keep breaking changes minimal; only people depending on actual values output or misusing features (e.g. Copy support) will be affected now.

It would mean adding serialisation to rand, only to later remove it when it is no longer possible to use specific RNG's because only wrappers are exposed.

True, but it gives people who need serialization an option sooner, and without too much breakage later (presumably only importing from a different repo instead, unless they decide to switch PRNG).