JuliaStats / Distributions.jl

A Julia package for probability distributions and associated functions.
Other
1.1k stars 414 forks source link

Error calculating mean of Truncated Log Normal #709

Open davidzentlermunro opened 6 years ago

davidzentlermunro commented 6 years ago

Any idea why I get the following error with this command:

mean(Truncated(LogNormal(1.0,5.0),0.0,1.0e5))

MethodError: no method matching start(::Distributions.Truncated{Distributions.LogNormal{Float64},Distributions.Continuous})
Closest candidates are:
  start(::SimpleVector) at essentials.jl:258
  start(::Base.MethodList) at reflection.jl:560
  start(::ExponentialBackOff) at error.jl:107
  ...
mean(::Base.#identity, ::Distributions.Truncated{Distributions.LogNormal{Float64},Distributions.Continuous}) at statistics.jl:19
mean(::Distributions.Truncated{Distributions.LogNormal{Float64},Distributions.Continuous}) at statistics.jl:34
include_string(::String, ::String) at loading.jl:522
include_string(::String, ::String, ::Int64) at eval.jl:30
include_string(::Module, ::String, ::String, ::Int64, ::Vararg{Int64,N} where N) at eval.jl:34
(::Atom.##100#105{String,Int64,String})() at eval.jl:75
withpath(::Atom.##100#105{String,Int64,String}, ::String) at utils.jl:30
withpath(::Function, ::String) at eval.jl:38
didWriteToREPL(::Atom.##99#104{String,Int64,String}) at repl.jl:129
hideprompt(::Atom.##99#104{String,Int64,String}) at repl.jl:65
macro expansion at eval.jl:73 [inlined]
(::Atom.##98#103{Dict{String,Any}})() at task.jl:80
andreasnoack commented 6 years ago

I don't remember if we have an issue for this but in general, there isn't a closed form solution to the truncated mean so this would have to be computed with numerical integration. We don't supply that as a fallback. Maybe we could.

ararslan commented 6 years ago

We even have QuadGK as a dependency already, so adding a fallback to integration wouldn't even require another dependency.

PaulSoderlind commented 6 years ago

I believe there are closed-form results for all moments of a truncated lognormal, but maybe a statistician can correct me. /Paul S

bdeonovic commented 6 years ago

Correct there are closed form results: https://en.wikipedia.org/wiki/Truncated_normal_distribution

Deduction42 commented 5 years ago

It is pretty easy to implement. I did it myself. The method I used to distinguish the continuous vs discrete distributions is a bit kludgey. There should be a way to determine if the parent distribution is continuous or discrete.

function truncmean( Dist::Truncated )
    F(x) = x*pdf(Dist,x)
    y = 0.0;

    if typeof(Dist) <: ContinuousDistribution #Continuous distribution
        y = quadgk(F, Dist.lower, Dist.upper)[1]

    else #Discrete distriubtion
        x = ceil( Dist.lower )
        q_max = 1 - 1E-9;
        x_max = min( Dist.upper, quantile( Dist.untruncated, q_max)  )

        while x < x_max
            y += F(x)
            x += 1
        end
    end

    return y
end
bdeonovic commented 5 years ago

can't you just check if Dist <: ContinuousDistribution ?

bdeonovic commented 5 years ago
julia> d=Truncated(Normal(0,1), 0, Inf)
Truncated(Normal{Float64}(μ=0.0, σ=1.0), range=(0.0, Inf))

julia> typeof(d) <: ContinuousDistribution
true
Deduction42 commented 5 years ago

Ah, that was the subtype! I was actually wanting to do it that way, but I wrote that code under a tight deadline so I didn't have the chance to thoroughly research the Distributions.jl type system. I updated the code in my previous comment.

davidzentlermunro commented 5 years ago

Has this being incorporated into the package?

PaulSoderlind commented 3 months ago

bump

ararslan commented 3 months ago

There's a convenient derivation here that relates the moments of the truncated log-normal to the moment generating function for the truncated normal: the $n^\text{th}$ moment of the log-normal distribution with parameters $\mu$ and $\sigma$ truncated to the interval $(a, b)$ is the moment generating function for the normal distribution with parameters $\mu$ and $\sigma$ truncated to the interval $(\log{a}, \log{b})$ evaluated at $n$. So with $Y \sim \text{LogNormal}(\mu, \sigma)$ and $X \sim \text{Normal}(\mu, \sigma)$, we should have

$$ M_{X | \log{a} < X < \log{b}}(n) = \exp \left( n \mu + \frac{n^2 \sigma^2}{2} \right) \frac{ \Phi \left( \frac{\log{b} - \mu}{\sigma} - n \sigma \right) - \Phi \left( \frac{\log{a} - \mu}{\sigma} - n \sigma \right) }{ \Phi \left( \frac{\log{b} - \mu}{\sigma} \right) - \Phi \left( \frac{\log{a} - \mu}{\sigma} \right) } $$

and so the mean would be

$$ \mathbb{E}[Y | a < Y < b] = M_{X | \log{a} < X < \log{b}}(1) $$

and the variance

$$ \begin{aligned} \text{Var}(Y | a < Y < b) &= \mathbb{E}[Y^2 | a < Y < b] - \mathbb{E}[Y | a < Y < b]^2 \ &= M{X | \log{a} < X < \log{b}}(2) - M{X | \log{a} < X < \log{b}}(1)^2 \end{aligned} $$

Then I believe we can define

function mgf(d::Truncated{Normal{T}}, t::Real) where {T}
    d0 = d.untruncated
    μ = mean(d0)
    σ = std(d0)
    σt = σ * t
    a = (minimum(d) - μ) / σ - σt
    b = (maximum(d) - μ) / σ - σt
    stdnorm = Normal{T}(zero(T), one(T))
    return exp(μ * t + σt^2 / 2 + logdiffcdf(stdnorm, b, a) - d.logtp)
end

function _truncnorm(d::Truncated{<:LogNormal})
    μ, σ = params(d.untruncated)
    a = d.lower === nothing ? nothing : log(minimum(d))
    b = d.upper === nothing ? nothing : log(maximum(d))
    return truncated(Normal(μ, σ), a, b)
end

mean(d::Truncated{<:LogNormal}) = mgf(_truncnorm(d), 1)

function var(d::Truncated{<:LogNormal})
    tn = _truncnorm(d)
    m1 = mgf(tn, 1)
    m2 = sqrt(mgf(tn, 2))
    return (m2 - m1) * (m2 + m1)
end

and likewise for skewness and kurtosis. Untested but I think that should work.

PaulSoderlind commented 3 months ago

Your first LaTeX equation should probably have $\exp(\mu n ...$ (the n is missing), or? Code seems right.

ararslan commented 3 months ago

Ah yes, thank you! Fixed.