dhardy / rand

http://doc.rust-lang.org/rand
Other
2 stars 2 forks source link

Fate of the `Rand` trait #83

Closed dhardy closed 6 years ago

dhardy commented 6 years ago

For a while now, I've been planning to replace Rand with some distribution (Default in this branch) for most purposes. The two are roughly equivalent; e.g. the one-tuple (T,) implementation boils down to:

// old Rand
impl<T: Rand> Rand for (T,) {
    #[inline]
    fn rand<R: Rng>(rng: &mut R) -> (T,) {
        (rng.gen(),)
    }
}

// new Disrtibution (`Default` name may change)
impl<T> Distribution<(T,)> for Default where Default: Distribution<T> {
    #[inline]
    fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> (T,) {
        (rng.gen(),)
    }
}

Why, you ask?

  1. Type-specific value generation now all comes under the Distribution trait, giving more uniform usage (rng.sample(Default) or rng.sample(Range::new(2, 11))) and making generic code easier
  2. Rand::rand does not support dynamic dispatch due to implicit Sized constraint; fixing this would break all Rand implementations, so we have no reason not to introduce a replacement

But, then, what about people using code like let v: (u32, u64) = Rand::rand(rng)?
This is what this issue is about; we could:

  1. Remove Rand completely, forcing users to update (not that hard in many cases; rng.gen() will still work)
  2. Introduce a wrapper trait; this eliminates breakage for users (and in some cases even implementations of Rand), but leaves duplicate functionality
  3. Introduce the above wrapper, with deprecation warnings, and remove before 1.0

Is the last option the best?

There is also rand_derive; we can probably introduce an alternative based on Distribution, but I don't know if we should? Apparently rand_derive has no dependent crates.

burdges commented 6 years ago

I'd call it Uniform or similar, not Default.

pitdicker commented 6 years ago

If we remove Rand, we are also killing rand_derive because it implements that trait?

I was wondering about rand_derive, and if there was some kind of reason it wouldn't show up with reverse dependencies on crates.io even while it was used. But maybe it doesn't work well?

rand can already take care of the basic types like integers, floats, char and bool. And also tuples, arrays with less then 32 elements (hacky i.m.o.) and Option. The places where you would use rand_derive are newtypes, struct's and maybe unions (I don't believe it works on larger arrays).

pitdicker commented 6 years ago

Thinking about trait names, Rand makes sense to me for all kinds of types, and the rand method is contains uses the most valuable method name in this library.

Distribution mostly makes sense for floats, and with a stretch for all primitive types. But the name does not feel to me like it fits all sorts of structs.

pitdicker commented 6 years ago

Can we say Rand, Rng, SeedableRng, StdRng and tread_rng are the big things users of rand depend on?

Personally I like option 3 best. With the exception of naming. But I am just thinking out loud...

If we end up removing Rand, and if we could pick a new name instead of Distribution, Rand would be a really nice new name. I think it even makes sense to rename it that way. When rand_core and rand_rngs are split off something like half the code left in rand will be implementing Distribution, enough to make sense for it to use 'our most valuable name' 😄. But then option 3 is unavailable.

Is a deprecation period something we really need? It is not like crates are forced to update. Or are types and traits from rand part of the public API of crates? Hmm, with Rand that may be the case.

A quick look on github: https://github.com/search?p=1&q=%22impl+Rand+for%22&type=Code&utf8=%E2%9C%93. Many, many are just old copies of librand. Starting with the results from page 7 we get into trouble...

dhardy commented 6 years ago

Personally I like the name Distribution; it describes the purpose exactly. Rand is a bit unclear just from the name.

I'd call it Uniform or similar, not Default.

Finally, a suggestion I like!

burdges commented 6 years ago

Of course Uniform is a distribution name, not a sample space name. Range is a sample space, so RangeFull might be the corresponding name there, except it lack a type.

In general, there is an issue with not all distributions making sense for all data types, which messes up some deriving I fear. I think Rand already encountered this.

In this vein, we cannot turn a RangeArgument<T> into a sample space directly because RangeFull lacks a type, so ..::<u64> make no sense, etc., so this fails:

trait SampleSpace {  type Domain;  }
impl<T,S> SampleSpace for S where S: RangeArgument<T> {  type Domain = T;  }  // Fails
trait Distribution<S: SampleSpace> { .. }

Instead, we need an obnoxious PhantomData in Uniform like

trait Distribution<T> {
    /// Samples are independent assuming independent `Rng` samples
    const INDEPENDENT: bool = true;
    fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> T;
}

struct Uniform<T, S>(PhantomData<T>,S);

/// Attempt to infer the type of a `RangeArgument` from context.
fn uniform<T,S: RangeArgument<T>>(S) -> Uniform<T,S> { Uniform<T, S>(PhantomData<T>, S) }

impl<T,S> Distribution<T> for Uniform<T,S> where S: RangeArgument<T> {
    default fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> T { .. }
}
dhardy commented 6 years ago

All this to allow the following?

let x = rng.sample(Uniform<u32, RangeFull>);

This should extend to something like rng.sample(Uniform<u32, _>::new(1..7)) (specifying type and sample space), but I'm not quite sure whether we can do that (perhaps, using a trick like the new Range code to store parameters).

I'm not convinced it's worth it though, because

  1. Aside from restricted range with uniform sampling over ordered values, I think most distributions will use the full or an implied range (sample space)
  2. For many distributions there is an implied sampling range anyway (e.g. exponential distribution over all positive numbers), so the two concepts are not really independent

(Of course one can have things like log-normal with restricted range, but that is effectively two different distributions, with a choice of clamping values down to the restricted range or re-sampling.)

If the distribution (e.g. Uniform) doesn't have a sample-space parameter, then it doesn't need a T parameter either. I've seen that done, but overall prefer the code we have in my rand branch.

burdges commented 6 years ago

I see. If rng.sample(..) is going to work, then uniform should be the default and the return type must be inferred from context, so this maybe:

pub trait Distribution<T> {
    /// Samples are independent assuming independent `Rng` samples
    const INDEPENDENT: bool = true;
    fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> T;
}

impl<T,S> Distribution<T> for S where S: RangeArgument<T> {
    default fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> T {
        .. maybe destructure and use self.start() and self.end() ..
    }
}

I take it you'll have a symmetric method for Rng too?

trait Rng .. {
    ...
    fn sample<T, D: Distribution<T>>(&mut self, dist: &D) -> T { dist.sample(rng) }
}
dhardy commented 6 years ago

Rust doesn't support default parameters so we have rng.gen() for the default (existing method) and rng.sample(distr) for the configurable version.

Yes, it requires a symmetric sample method, but in an extension trait to allow rand-core to be separate (so trait Sample: Rng).

dhardy commented 6 years ago

Oh, why do you keep adding Distribution::INDEPENDENT? Just as documentation? Since sample takes &self not &mut self it's hard to support samples which are not independent.

Yes, this is why rand 0.3 has both IndependentSample and Sample, but I don't think it's worth having both, personally (as @GrahamDennis noted, one is a random distribution and the other is a random process).

burdges commented 6 years ago

Yes, Distribution::INDEPENDENT is dumb. I doubt anyone would use interior mutability in a Distribution anyways.

I strongly dislike this push for senseless crate separation: An extension trait is a clear harm. Separate docs are a clear harm. What value does a separate crate bring?

dhardy commented 6 years ago

@burdges I'm not really convinced myself, though it's come up several times, e.g. here.

dhardy commented 6 years ago

I see rand_derive now has one reverse dependency: glitter (glit crate).

I'm not sure, but we could go with option 2 and keep Rand around forever in some form; I think that would be enough for rand_derive.

dhardy commented 6 years ago

After having had a look at quickcheck's Arbitrary trait, which is a parallel of Rand but with an added shrink method, I do not think rand_derive is useful for this feature. Adding deriving for Arbitrary (including a shrink implementation) might make sense, but that I'll leave to @BurntSushi.

The example mentioned above derive's Rand for some types then uses that to implement Arbitrary, but without implementing shrink (e.g. Expression impl). @BurntSushi do you agree that it would make more sense for this example to implement Arbitrary directly, and perhaps make use of a "derive Arbitrary" macro instead (e.g. for Name)?

I ask because I'm very tempted to mark rand_derive and Rand obsolete (replacing the latter with a Uniform distribution, but not replacing the former), and this use for quickcheck appears to be the main use-case for both.

I guess https://github.com/rust-lang-nursery/rand/pull/148 is related.

vks commented 6 years ago

I ask because I'm very tempted to mark rand_derive and Rand obsolete (replacing the latter with a Uniform distribution, but not replacing the former), and this use for quickcheck appears to be the main use-case for both.

Sounds good to me! I think rand_derive is only useful for types exclusively composed of integers.

pitdicker commented 6 years ago

Also 👍 from me. I think we can see quickcheck's s Arbitrary trait as similar and better for the purpose of testing.

BurntSushi commented 6 years ago

@dhardy Apologies, I'm only just now seeing your ping. I'm afraid I don't really have enough context to understand the question you're asking me. Can you briefly summarize what input you need from me?

pitdicker commented 6 years ago

I see rand_derive now has one reverse dependency: glitter (glit crate).

I'm not sure, but we could go with option 2 and keep Rand around forever in some form; I think that would be enough for rand_derive.

Nobody is forced to update, and even when glitter wants to update it probably does not need a big change? It only uses #[derive(Rand)] for an enum. It can just generate a number in in the range 0..17 and map that to the enum values.

dhardy commented 6 years ago

@BurntSushi Arbitrary is a bit like Rand: it provides a way to create values of type self, given a helper type (Gen or Rng). So it is possible to use Rand and #[derive(Rand)] to (partially) automatically implement the arbitrary generation for structs (etc.) composed of sub-types which already support Rand. Have you considered adding #[derive(Arbitrary)] support along the same lines?

That's partly besides the point; I just discovered that glitter uses rand_derive to implement Arbitrary — but as @pitdicker says it's not a big deal if we drop support for rand_derive.

BurntSushi commented 6 years ago

@dhardy I don't think I've ever used derive(Rand) personally, and while derive(Arbitrary) is something I've heard people say they want, I've never given much thought to it myself and have no plans to work on it. The hardest bit about it is coming up with the shrinker aspect of Arbitrary. It's not clear there is even a way to automatically derive that.

dhardy commented 6 years ago

Rand has been deprecated and Uniform implemented: https://github.com/rust-lang-nursery/rand/pull/256 Also, rand-derive has been deprecated (even though GH says not merged): https://github.com/rust-lang-nursery/rand/pull/263