Open devmotion opened 3 years ago
julia> typeof(rand(Float32, Gamma(big(0.5))))
BigFloat
That feels a bit surprising to me -- explicitely requesting one type, but getting a different one.
I mean, I get that the intent is to express "start from rand(::Float32)
, and push forward through Gamma(big(0.5))
", but I can't help reading it as "sample from Gamma(big(0.5))
and give me the result as a Float32
".
The more often I read it the better it looks, but that's something that the uninitiated observer might stumble upon.
Thanks @devmotion , this is an interesting idea. It does limit some things though, for example in many cases it's useful to be able to specify a container type, like an SArray
vs vanilla Array
.
We had already planned this at a basic level, but the idea can be taken even further, as @rfouquet does in RandomExtensions.jl.
Also, calling in this way is enough like eltype
that I'm concerned it will break in similar ways. I think I'd prefer to specify the type, and have that dispatch to lower-level code.
Hey @devmotion , I think I was too dismissive of this idea. It doesn't solve the entire problem, but it does solve part of it, and I'm starting to think the two parts may be orthogonal.
To be clear, the two concerns are
rand
calls, andFor (1), I think we should adopt your suggestion. Thank you :)
For (2), I'm mostly thinking in terms of different Array types, though there's plenty outside of this. Rather than have yet more arguments to pass around, I think we could do this by dispatch. Maybe we have some default containers, and the user can create new methods for specific measures if they like. We'll need to think more about the best way to do this.
My belief is that the "core" random number method should be one that generates integers (say, a uniform random integer in the interval [0, n)
), rather than a method that generates floating-point values.
After all, in a computer, generating floating-point variates (as rand
does, for example) ultimately depends on generating random integers, not the other way around. (Most pseudorandom number generator algorithms in common use, such as Mersenne Twister, PCG, and the xorshift, xoroshiro, and xoshiro families, produce integers, not floating-point numbers.) Moreover, no computer can choose randomly from among all real numbers between two others, as there are infinitely many of them.
The central point of the proposal above is to "pushforward" automatically the type of random samples from a lower level to the arbitrarily complex structure of the samples from a distribution of interest. I still believe that it would be beneficial if one would not fix the type of samples of interest a priori, both for users and internally. However, for this proposal relies on the user being able to specify the desired type of the samples from the lower-level rand
calls in Random
such as rand(Float64)
(the default), rand(Float32)
, randn(Float32)
etc. So even if these procedures in Julia base are implemented by generating random integers it is important to set the type of the floating point samples.
Additionally, clearly this proposal does not cover the second (unrelated but probably also important) issue mentioned by @cscherrer: it only helps to infer the type of individual samples but does not give you control over the type of collections of samples.
I think see what you mean @peteroupc. Julia's Base
wraps all of this so you can ask for random numbers of any type. In principle, it could sometimes be useful to delay the conversion to floating point, instead rewriting some operations in terms of ints.
I think we need to defer this and assume the idiomatic Julia approach. The scope of this work is already very broad, and internals of the random number generation process are mostly an orthogonal concern.
Last week I had some discussion in a PR to Distributions about the design of
rand
(https://github.com/JuliaStats/Distributions.jl/pull/1262#issuecomment-764995042), and I think it applies to MeasureTheory as well.TL;DR: It is problematic to prescribe the type of samples of a distribution, both for users and internally. However, similar to
logdensity
, the type of the samples follows naturally if (1) the type of the basic samples from the RNG obtained withrand
,randn
, orrandexp
, and (2) the parameters of the distribution of interest are known. Thus I propose to not prescribe the type of the samples of the distribution but the type of the basic samples from the RNG with typeT
inrand(::AbstractRNG, ::Type{T}, ::Distribution)
.Partly copied from https://github.com/JuliaStats/Distributions.jl/pull/1262:
Basically every sampling procedure at some point starts sampling with
rand
,randexp
, orrandn
in Random from a uniform distribution in[0, 1)
, the exponential distribution with rate 1 onR_{>0}
, or the standard normal distribution onR
, respectively. The samples from the distribution of interest are obtained by transforming these most basic samples (ie. basically the distribution of interest is a push-forward of the distribution of these basic samples), and the type of the random variates follows automatically from the type of these basic samples and the function that transforms them and involves the parameters of the distribution.My suggestion would be to allow to specify the type of exactly (and only) these base variates, i.e., to change such "basic calls" to
rand(rng, T)
,randn(rng, T)
etc. whereT
can be specified by the user. The type of the random variates of the distribution of interest would then be determined both byT
, the parameters of the distribution, and the transformations needed to turn the basic samples into a sample from the distribution.More concretely, with a default of
T = Float64
one would obtainAn internal detail: Probably one would want to promote
T
to the precision of the parameters to ensure that the precision of the underlying draws is sufficiently high, i.e., with parameters ofBigFloat
the samples should be based on draws of typeBigFloat
in all cases.