JuliaStats / Distributions.jl

A Julia package for probability distributions and associated functions.
Other
1.1k stars 414 forks source link

Is there a reason `rand(TDist(nu))` does not obey the type of `nu`? #1884

Open Red-Portal opened 1 month ago

Red-Portal commented 1 month ago

Hi, it seems that

nu = 3f0
rand(TDist(nu))

returns a Float64 instead of a Float32. Is this intended behavior? I find it unusual because the rand implementation tries to be type-stable half way there by using one(T) but ends up calling randn(rng) without specifying the type.

devmotion commented 1 month ago

Indeed, this seems to be a bug. randn(rng) should probably be changed to randn(rng, typeof(z)).

Red-Portal commented 1 month ago

Okay let me pick this up.

Red-Portal commented 1 month ago

Hi @devmotion I opened a PR, could you take a look when you have time?

quildtide commented 1 month ago

Has the situation from #1071, #1666, etc. changed?

I mean, personally, I'm not at all against the idea that eltype(d) should equal partype(d) when {eltype(d), partype(d)} <: AbstractFloat.

Although there should probably be a better interface that can generalize across more distributions.

devmotion commented 1 month ago

I'm not at all against the idea that eltype(d) should equal partype(d)

That's absolutely not the idea here. In the case of TDist, we get a sample from ChiSq (of some type, regardless of whether it follows any conventions or not). And IMO regardless of what type it is, it seems wrong to modify/promote its type by a randn call (basically a "Float64 literal").

Red-Portal commented 3 weeks ago

Hi @devmotion , unfortunately, it seems that calls to rand(q, d) still result in the wrong type. I guess this is because eltype(q) returns Float64. Should I make a PR that defines eltype(q) = float(partype(q))?

quildtide commented 3 weeks ago

One option within the current interface, if you need more than one random value, is to call the rand! method on a vector of the type you want, e.g. rand!(Exponential(), zeros(Float32, 20)).

I think there'll still be a double conversion going on (Float32 -> Float64-> Float32) if you call this on the current public version, but I think a side-effect of merging #1879 will cause things to stay Float32 at all times. Not 100% sure on this though; hadn't given this much thought before.

Red-Portal commented 2 weeks ago

Hi @devmotion , could you comment on how to proceed?

devmotion commented 2 weeks ago

Hmmmm that's annoying, that leads directly to all the eltype inconsistencies 😄 Initially - and for basically all distributions apart from Normal - eltype and partype were two completely different things: eltype was supposed to be the type of samples (Float64 for continuous and Int for discrete distributions) whereas partype was the common/promoted type of the parameters.

At some point someone wanted rand(Normal(0f0, 1f0)) to return samples of type Float32, so the definition of eltype for Normal was changed to partype. But this made it completely inconsistent with how it was supposed to be defined, how it's defined for other distributions - and generally it's not even the type of the samples (think of e.g. Normal{Int})!

Sure, we could just propagate this inconsistency a bit more but I'm a bit reluctant. Not because I think that it should return Float64 but because I think this eltype concept is utterly broken. I think we should just stop defining eltype - an "element type" of a distribution is not clearly defined: Is it supposed to be the element type of the struct? The return type of rand? The return type of quantile? The element type of support?

Maybe at least for univariate distributions we could start removing it from the rand pipeline by changing https://github.com/JuliaStats/Distributions.jl/blob/e1340f02b5604cedf02df93b9ae202f3679d9818/src/genericrand.jl#L34-L35 to

    return map(_ -> rand(rng, sampler(s)), CartesianIndices(dims))

(probably with an additional function barrier, similar to rand!) In the default fallback for rand! we call scalar rand in a loop anyway. And defining array rand for those univariate distributions for which it is beneficial (see #1879) doesn't seem too bad.

quildtide commented 2 weeks ago

but because I think this eltype concept is utterly broken. I think we should just stop defining eltype - an "element type" of a distribution is not clearly defined: Is it supposed to be the element type of the struct? The return type of rand? The return type of quantile? The element type of support?

In a perfect world, I believe eltype of a distribution should be the type of the value returned by rand, quantile, minimum, maximum, median, mode, etc.; and that mean and std should return a result of type float(eltype).

It is my view that a Distribution in Distributions.jl is an infinite-length unordered container which defines rand as a way to allow users to obtain values. There's only two discrepancies between this view and the status quo that I'm aware of; size and length in Distributions.jl return the size/length of a single sample, while I believe that those roles should go to another function while these functions should always return Inf on a Distribution.