Closed dhardy closed 6 years ago
I'm looking at bringing over some of the changes to distributions code to upstream
:tada:. Also, thanks for opening this issue for a fresh look and discussion. Anyone we should cc?
In https://github.com/rust-lang/rfcs/pull/2106#discussion_r136091362 I suggested that Basic
may be a good alternative to Default
, so it does not clash with std::default::Default
.
Dropping Uniform
in favor of Default
(or Basic
) seems like a nice simplification for integers. FullRange
is a nice find as alternative, it fits with the Range
distribution. But I don't think code using this is thinking in terms of ranges, and it may feel less natural there.
For the uniform distributions of floats we now have:
Uniform01
: [0, 1)Open01
: (0, 1)Closed01
: [0, 1]I don't think the names are perfect. Uniform01
and the alternative HalfOpen01
both have disadvantages. How do you know which side is open? [0, 1) vs (0, 1].
For the Open01
distribution I can see real uses. In some situations you never want to generate 0.0, for example when the value is to be used for division or log. It uses simple rejection sampling on top of Uniform01
.
Personally I am not sure if we need both the Uniform01
and Closed01
distribution. We can guarantee that every value has an uniform chance of occurring. But the chance of generating 1.0 is one in 2^53 for f64
(and 1 in 2^22 for f32
). Does that really merit two different distributions?
There is an other not-all-that-meaningful difference between Uniform01
and Closed01
: all numbers that straddle an exponent boundary have a tiny chance of occurring to often. 0.5 will occur every 1 in 2^23 results too often, 0.25 every 1 in 2^24, 0.125 every 2^25 (for f32
, and I wrote this quickly, exponents could be off by 1). The problem is that there is just no better distribution possible for a closed-open range. Closed01
has a distribution as perfect as technically possible, but it also is slightly slower.
I think renaming Uniform01
to Default
(or Basic
) is a good solution, leaving only Open01
and Closed01
as slower but sometimes useful alternatives for floats.
(adjustments to things like
Exp1
are needed to support dynamic dispatch(Rng+?Sized)
Removing the Sized
constraint is a desired change, but you probably want to wait with it so all the changes that effect RNG's are bundled in one release?
I don't really understand the consequences around dynamic dispatch. I should just try it and see what brakes, but would you be willing to explain it?
In https://github.com/dhardy/rand/issues/72#issuecomment-353937015 I made a list of changes / extra tests we may want to add to the distributions, besides something like an external test suite. There is not really anything in there that should happen before the merge into upstream though.
Something I would really like to see is to have the floating point range implementation be exact, and not sometimes crossing the boundaries. Now it is a bit strange that we can say: the precision of the results of Range
has improved much, but it is still imprecise around the boundaries. I still believe it must be simple, but haven't found the trick yet.
Dynamic dispatch: lots of OO languages use a vtable at the beginning of an object, for example an Java object of theHashMap
class is in fact a fat object with a pointer to the HashMap
virtual table as well as the internal map fields. That way if you cast the object to an AbstractMap
then call size()
, the run-time finds HashMap::size
. Rust works a bit differently. There isn't a Map
trait because it requires Higher Kinded Types, but for example String
implements Write
. This String
object does not have a vtable, nor does a &mut String
reference to it, but if you want to cast that to &mut Write
, the CPU needs to know how to access the write_str
function implementation for String
. The way that works is that a &mut String
is a fat pointer, containing one pointer to the vtable and another to the String
object. So what's ?Sized
got to do with it? &mut Write
is by default Sized
, implying the compiler knows the size of the pointed object, which is only possible if the compiler knows (at compile time) the type of the underlying object. If it doesn't, you need &mut Write + ?Sized
.
Removing the Sized constraint is a desired change, but you probably want to wait with it so all the changes that effect RNG's are bundled in one release?
The last few commits re-implement some types as distributions to let me switch them. This still needs Uniform01
or equivalent however.
Why do it now? Changing Distribution::sample
from sample<R: Rng>
to sample<R: Rng + ?Sized>
is a breaking change: all implementations must be updated. So I'd rather get it right first time.
In #72 (comment) I made a list of changes ...
I'm not looking at changing the algorithms now, just the infrastructure. I might let you make the algorithms PRs ;-)
Hmm, rand::distributions::Default
or rand::distributions::Basic
... I still prefer the former, but not certain. Another option might be Standard
, but that name is probably even more confusing than Default
; obviously Normal
is inappropriate (thanks Gauss).
Removing Uniform
and Uniform01
may make sense, though they don't cost us much (aside from confusing users).
@pitdicker maybe I can annoy you even more by putting implementations in src/distributions/float.rs
and src/distributions/integer.rs
. Then I think we also do not need src/utils.rs
. Okay?
At least, the compiler doesn't complain, though my text editor highlights both Default
and float
incorrectly. Not important; the module names will probably never be exposed to users anyway.
Thanks for the explanation around ?Sized
. I mostly get the idea behind how it works, but have some trouble yet so see all the consequences for the API... Give me some time and otherwise I will trust you on it.
I might let you make the algorithms PRs ;-)
Ok, it is high time I start working on upstream š.
maybe I can annoy you even more by putting implementations in
src/distributions/float.rs
andsrc/distributions/integer.rs
. Then I think we also do not needsrc/utils.rs
. Okay?
Just two weeks ago I was thinking something like that would be nice, but more with the range code in mind. Could the RangeImp
parts also be part of the new float.rs
and integer.rs
files? And what is left be moved to src/distributions/mod.rs
?
I hadn't thought about moving the range code, but I guess that could work, if there's not too much to it.
The distributions top-level module currently has Distribution
, some Rand
bits, some "weighted" distribution code I don't know what to do with, and the ziggurat
function. Maybe we should try moving the last two things out if we're adding Default
/Basic
and Range
.
I believe we settled this, though we now use Standard
as the "default" distribution.
Update: we will likely use these distributions:
Uniform
for the default distribution, replacingRand
(including ints, floats with range[0, 1)
, etc.)Original post follows.
I'm looking at bringing over some of the changes to distributions code to upstream, in particular replacing
Sample
andIndependentSample
withDistribution
, but the change-set touches on a few other things (adjustments to things likeExp1
are needed to support dynamic dispatch (Rng+?Sized
) inDistribution::sample
, sinceRand
doesn't support this and changing that would break allRand
implementations everywhere).So, question: is everyone happy with the distribution names we converged on in this branch, that is
Uniform
for integers,Uniform01
andOpen01
etc. for floats,Default
for "a sensible distribution for any type"? This is of course redundant; we could dropUniform
andUniform01
in favour of usingDefault
directly. Possibly there is a better name thanDefault
(but please notRand
)!Wacky ideas:
Uniform
āFull
orFullRange
Uniform01
āHalfOpen01
Of course we don't need any changes. Just asking since it didn't always appear people were happy with these names.