JuliaStats / Distributions.jl

A Julia package for probability distributions and associated functions.
Other
1.11k stars 416 forks source link

Type of sample of `Exponential{T}` is not `T` #1902

Open wouterwln opened 1 month ago

wouterwln commented 1 month ago

MWE:

θ = 1.0f0
d = Exponential(θ)
typeof(rand(Exponential(θ))) == typeof(θ)

returns false

Because Exponential sampling is also used when sampling Gamma with shape 1, the problem also occurs there. For Gamma this introduces a type instability, since the return type for sampling for a Gamma depends on whether or not the alpha parameter is equal to 1.0.

bvdmitri commented 1 month ago

This is problematic here

julia> Gamma(1f0, 1f0) |> rand
1.1458127367196822

julia> Gamma(2f0, 1f0) |> rand
1.662515f0
andreasnoack commented 1 month ago

Please search the issue tracker for several similar issues. I think this issue might be redundant.

wouterwln commented 1 month ago

Unfortunately it is not. It was introduced in https://github.com/JuliaStats/Distributions.jl/commit/3946acc6901b5126022ae22564ab9aa7d913ed1e which was committed last month. We found this in our downstream test pipeline, so unfortunately this is a new issue.

andreasnoack commented 1 month ago

Which version of Distributions are you using? On master I'm getting

julia> typeof(rand(Exponential(θ))) == typeof(θ)
true

and that is a consequence of the commit that you link to (which I had forgotten).

wouterwln commented 1 month ago

You are right, on master this seems to be fixed. Reading the discussion in https://github.com/JuliaStats/Distributions.jl/pull/1885 improved my understanding. The issue that is found by our tests is that eltype(Gamma(1.0f0, 1.0f0)) is not Float32 (for reference, for Normal this returns Float32). The story here implies that eltype should return the type of the samples. Is that correct? If this is not the case, the problem is downstream and we have to make an exception downstream.

Edit: in the documentation it says the following:

eltype(::Type{Sampleable})

  The default element type of a sample. This is the type of elements of the
  samples generated by the rand method. However, one can provide an array of
  different element types to store the samples using rand!.
bvdmitri commented 1 month ago

Indeed for Gamma eltype fallbacks to

Base.eltype(::Type{<:Sampleable{F,Continuous}}) where {F} = Float64

and for Normal eltype is defined as

Base.eltype(::Type{Normal{T}}) where {T} = T

in our code base we basically implicitly tested that eltype(distribution) == typeof(rand(distribution)) and it failed. I would suggest to add those tests in Distributions.jl as well because now the implementation for Gamma does not follow the documented behaviour of eltype(::Type{Sampleable}). Specifically this line This is the type of elements of the samples generated by the rand method. which is not for Gamma (and maybe other distributions)

julia> eltype(Gamma(1f0, 1f0)) == typeof(rand(Gamma(1f0, 1f0)))
false

Also this behaviour leads to this inconsistency, where semantically similar code returns different container types

julia> rand(Gamma(1f0, 1f0), 1)
1-element Vector{Float64}:
 1.2745195627212524

julia> [ rand(Gamma(1f0, 1f0)) ]
1-element Vector{Float32}:
 3.234097
andreasnoack commented 1 month ago

in our code base we basically implicitly tested that eltype(distribution) == typeof(rand(distribution))

I think it's best to avoid eltype(::Distribution) completely. Please refer to the many long discussion in the issue tracker. As an example of the issue here consider the case

julia> eltype(Normal(1//1, 1//1))
Rational{Int64}

julia> rand(Normal(1//1, 1//1))
0.3227598879915683

so eltype(::Normal) is just lying to you.

bvdmitri commented 1 month ago

You cannot expect users to refer to long discussion in the issue tracker. The issue tracker is for developers, not for users. Users are looking at the documentation of the package. And the documentation is misleading. If the documentation is lying it must be changed.

bvdmitri commented 1 month ago

As a side note, obtaining the exact sample type without having to call eltype(rand(...)) is indeed a highly useful feature for preallocating containers, as the documentation suggests. However, it seems like this functionality is currently somewhat broken. For example, even with your code:

julia> distribution = Normal(1//1, 1//1)
Normal{Rational{Int64}}(μ=1//1, σ=1//1)

julia> container = zeros(eltype(distribution), 10);

julia> Distributions.rand!(distribution, container)
ERROR: MethodError: no method matching randn(::Random.TaskLocalRNG, ::Type{Rational{Int64}})

This issue makes it challenging to write generic code that preallocates containers for samples from arbitrary distributions, as eltype doesn’t seem to be designed for this purpose and is "lying". Is there an alternative method to handle this?

andreasnoack commented 1 month ago

My point is that this is an open issue and part of the discussion here is just repeating arguments from other issues.

obtaining the exact sample type without having to call eltype(rand(...)) is indeed a highly useful feature for preallocating containers

This is the eternal challenge. There is no easy way to know this type. You can

  1. Compute the elements and promote the container as you go
  2. Ask Julia's type inference (please don't do this)
  3. Try to guess

When you try to write generic code then 3 almost always ends up being wrong at some point. Part of the issue here is that Distributions was original written just for Float64 and Int64 parameters and random variates and then later made generic which invalidated some of the original assumptions. But now I'm repeating explanations already available in other issues.

bvdmitri commented 1 month ago

I understand your point, but the documentation is a primary source of information for most users (and not GitHub issues) and seems to be incorrect in its current form. If the previous original assumptions were invalidated it should be reflected in the documentation too. Now the documentation states that eltype returns the type of the samples, which isn't accurate. Would it be possible to at the very least just update the documentation? This will certainly prevent confusion and the need for repeated explanations in other issues.