dhardy / rand

http://doc.rust-lang.org/rand
Other
2 stars 2 forks source link

Testing of distributions #72

Closed dhardy closed 6 years ago

dhardy commented 6 years ago

Continuing from #69

We should add some testing for distributions. Unfortunately since distribution samples are random, this is inherently difficult.

@pitdicker suggests using 256 buckets but notes that this could only pick up very large errors. In my experience we'd need somewhere in the range from 10'000 to 1'000'000 samples to make the test reasonably useful and reliable, and it still wouldn't be 100% reliable (meaning we'd probably want to fix the seed, rather than let CI fail randomly from time to time).

In my opinion, that isn't the way we should go about it; instead we should do two things:

  1. Have an external test suite for "supervised testing" (this could be as simple as plotting a histogram of results via gnuplot)
  2. Add "black box" unit tests (my terminology) which simply tries a few fixed inputs and checks they get the same results as last time (i.e. the test tells you when a code change affects output, but not whether the output is still follows the distribution)
dhardy commented 6 years ago

This brings up another point: should distributions be reproducible?

We've already decided we want reproducible generators (meaning that a fixed seed produces a fixed output sequence, even if the library code is updated). Do we want the same for distributions?

At a minimum I think changing distribution output should be considered a breaking change post 1.0 release.

Black box tests are better than bucket tests here since they also test reproducibility of results.

Lokathor commented 6 years ago

distributions should absolutely be reproducible any time they're run with a reproducible generator, yes.

Anything else means that they're somehow getting their own entropy into the system instead of only using the entropy provided by the generator, which is clearly the wrong thing to do.

pitdicker commented 6 years ago

The question about reproducibility is more: are we allowed to change the algorithm?

We already have several changes that make distributions return completely different results:

I can see some changes that would be good:

Then there are some open questions:

Strictly speaking any change should probably be considered a breaking change.

We may be able to relax that a little if we now how the distributions are used though. Are applications that require them to be reproducible across versions common or not? Are they used in games, libraries etc.? Is anyone now trying to do simulations in Rust? If you require reproducibility, is it not reasonable to pin crates critical to the algorithm to specific versions?

Together with the testing this issue is about, and that we probably want some extra distributions, it seems there is quite some work left to do. Not sure there is any thought yet about a 1.0. If the goals is API stability, this can all be done in the future. If we want want to present 1.0 as something we have reasonably good confidence in, we should at least have answers to the questions.

Lokathor commented 6 years ago

Ah, I misread the question somewhat.

No, I don't think that exact results should be relied upon to not change between versions for a particular distribution. New versions must be free to update to newer methods and techniques without it being considered breaking. The contract should be that valid results are produced (eg: some distribution of 0.0 to 1.0 must never produce 1.2), but the exact input and output, or number of generator cycles used, that is much better kept as open as possible.

dhardy commented 6 years ago

Reproducibility could be useful for some simulations but I don't think it's critical for much. I suggest we only require a bump in the minor version number if an algorithm produces different results (i.e. after 1.0 patch releases must produce identical results from all distributions, given fixed input).

tspiteri commented 6 years ago

While I agree that changing generators should be considered a breaking change, and while I consider reproducibility in simulations very important, I agree that changing how results are computed in distributions should not be considered a breaking change. When I'm doing simulations and I want to be able to reproduce in case for example I get a strange result, I always save (a) the start seed and (b) the git revision of my simulation code that includes Cargo.lock. And that handles both changes in the code on my side and any non-breaking changes.

dhardy commented 6 years ago

Respectfully, I disagree. Post 1.0, requiring a bump to the minor version number (1.0 → 1.1) would be enough, but having changes in distribution output with any new release would be a pain in some circumstances.

Of course without tests, that's difficult to enforce.

tspiteri commented 6 years ago

I agree with that, I was only trying to say that changing distribution output should not be a major bump, maybe I exaggerated a bit 😄. But yes, you're right.

dhardy commented 6 years ago

Moved: https://github.com/rust-lang-nursery/rand/issues/357