JuliaStats / Distributions.jl

A Julia package for probability distributions and associated functions.
Other
1.11k stars 417 forks source link

Feature request: Add a shifted (or 3-parameter) LogNormal #1214

Closed briochemc closed 4 years ago

briochemc commented 4 years ago

I would like to be able to get the logpdf and gradlogpdf of a random variable X with a shifted log-normal distribution, i.e., such that log(X + λ) ~ N(μ,σ). Maybe adding a 3-argument LogNormal could work?

I saw that there have been discussions about generic transformations (e.g., https://github.com/JuliaStats/Distributions.jl/issues/69), but it does not look like that came to fruition.

I also saw that some similar features have been added for, e.g., multivariate normal distributions (https://github.com/JuliaStats/Distributions.jl/pull/1051), but this was not implemented for other distributions, not even univariate normal distributions, AFAIK. (This seems like an oversight to me, but maybe I'm missing something.)

Finally, I guess that some external packages (maybe Bijectors.jl, DistributionsAD.jl, TransformVariables.jl or others), could help, but I have no idea how to use them in my specific use case.

I'm not a stats expert by any means, so I'm not sure this request makes sense 😅 but I have a concrete use case for it and I would love to be able to keep using just Distributions.jl for it. Also, please let me know if there is a workaround or a solution involving other packages!

JHonaker commented 4 years ago

The gradient of a shifted distribution is just the gradient of the unshifted distribution evaluated at the point. So, you already have the gradient of any shifted distribution, and actually you have the gradient of any shifted or scaled pretty easily (as well as any function). This follows directly from the chain rule of Calculus:

image

So, if u(x) = ax + b then:

image

briochemc commented 4 years ago

This is not an issue, it is a feature request. I can do the maths, it's not what this is about... I have a complicated-ish pipeline which I feed Distributions to and which calls, e.g., logpdf and gradlogpdf on those, so I'm looking for a way to manipulate Distributions into other, transformed Distributions, such that logpdf and gradlogpdf are defined on them.

For example, you can manipulate MvNormals this way:

julia> n = 2
2

julia> μ = randn(n)
2-element Array{Float64,1}:
 -2.1756248203084385
  0.11893918507422431

julia> Σ = (A = randn(n,n); A'*A)
2×2 Array{Float64,2}:
 0.520004  0.78019
 0.78019   2.50689

julia> d = MvNormal(μ, Σ)
FullNormal(
dim: 2
μ: [-2.1756248203084385, 0.11893918507422431]
Σ: [0.520004363513731 0.7801904739218917; 0.7801904739218917 2.506893858621977]
)

julia> rand(n,n) * d + rand(n)
FullNormal(
dim: 2
μ: [-0.08744963479159473, -0.6322712809884787]
Σ: [2.8038858165606957 0.8426689566840974; 0.8426689566840974 0.27433469653672427]
)

(And as you might realize, it is also odd that this has been implemented for MvNormal but not for univariate Normal, but I personally don't need that myself at this stage.)

nignatiadis commented 4 years ago

One thing that is quite close to what you want is LocationScale in Distributions.jl. It only works for ContinuousUnivariateDistribution however.

Example:

julia> affinely_transformed_lognormal = LocationScale(1, 2, LogNormal(0.1))
LocationScale{Float64,LogNormal{Float64}}(
μ: 1.0
σ: 2.0
ρ: LogNormal{Float64}(μ=0.1, σ=1.0)
)

julia> gradlogpdf(shifted_lognormal, 4)
-0.46848836936938815

In particular, recall that if X ~ F for a distribution F, then aX+b~LocationScale(b, a, F).

JHonaker commented 4 years ago

Ah, I see. Sorry, if it came off a little harsh. I was just trying to help since it seemed like you were having trouble figuring out what the gradient would be (or so I thought from the talk of auto diff). For location scale transformations, you have LocationScale like @nignatiadis says.

As for why you can do it with MvNormal, it’s because the parameters for the Normal distribution are exactly the mean and variance of the distribution. Normal distributions are a big exception for a lot of things. The sum of Normals is Normal(sum of means, sum of variances); it’s its own conjugate distribution; etc. It’s not as straightforward of a transformation on every parametric distribution.

You can easily write the gradient analytically if you have a closed form expression for your transformation and it’s gradient though.

You could make a struct that wraps a distribution,

struct DistOfU # I don’t know what the proper super type is off the top of my head.
    base::Distribution
    u::Function
    univ::Function
    dudx::Function
end

gradlogpdf(d::DistOfU, x) = gradlogpdf(d.base, u(x)) * dudx(x)
# other functions that just forward the calls to base for the things you need.
# e.g. rand(d::DistOfU, args...) = d.uinv(rand(d.base, args...))

I actually did something like this today for my own work that needs a lot more granularity of control than Distributions allows (and I don’t need to . The approach works really well. I used it for a lot of different things that needed to be non-standard or replaced.

It’s a really powerful idea for software in general.

briochemc commented 4 years ago

Ah, I see. Sorry, if it came off a little harsh.

No worries, my bruised ego will recover in time 😄

OK great! Thanks to both! Yes, LocationScale looks to be exactly what I need.

Back to the feature request, would it make sense, then, to add these

# for simple transforms of distributions, e.g., d + x, d - x, and so on
Base.:+(d::ContinuousUnivariateDistribution, x::Number) = LocationScale(x, one(x), d)
Base.:+(x::Number, d::ContinuousUnivariateDistribution) = d + x
Base.:*(x::Number, d::ContinuousUnivariateDistribution) = LocationScale(zero(x), x, d)
Base.:*(d::ContinuousUnivariateDistribution, x::Number) = x * d
Base.:-(d::ContinuousUnivariateDistribution, x::Number) = LocationScale(-x, one(x), d)
Base.:-(x::Number, d::ContinuousUnivariateDistribution) = LocationScale(x, -one(x), d)
Base.:/(d::ContinuousUnivariateDistribution, x::Number) = LocationScale(zero(x), inv(x), d)
# for composed transforms to simplify on the fly
Base.:+(d::LocationScale, x::Number) = LocationScale(d.μ + x, d.σ, d.ρ)
Base.:+(x::Number, d::LocationScale) = d + x
Base.:*(x::Number, d::LocationScale) = LocationScale(d.μ, d.σ * x, d.ρ)
Base.:*(d::LocationScale, x::Number) = x * d
Base.:-(d::LocationScale, x::Number) = LocationScale(d.μ - x, d.σ, d.ρ)
Base.:-(x::Number, d::LocationScale) = LocationScale(x - d.μ, -d.σ, d.ρ)
Base.:/(d::LocationScale, x::Number) = LocationScale(d.μ, d.σ / x, d.ρ)

This would allow for things like

42 * (2 * LogNormal(0.0, 1.0) - 1)

to work out of the box!

briochemc commented 4 years ago

Of course, specialized version of these operations could be defined for Normal, just like MvNormal already has, since the type remains a Normal.

JHonaker commented 4 years ago

Yea, I think that would be a perfectly reasonable thing to do for your project. I don't know if it's really in line with what Distributions.jl is trying to do though. It's a very generic distribution library, not a library for manipulation of random variables.

That's not to say that something like that wouldn't be useful to anyone. It's just a slight feature creep thing that might be better suited to another new package. I don't know how new to Julia you are, but small packages with a simple core idea are actually really common (and I'd argue preferred) in the Julia ecosystem. Of course we have huge monoliths like Plots.jl and Turing.jl just like Python has Matplotlib and PyMC.

Plus this really only works so simply for LocationScale and Normal distributions. You've already written most of the code you'd need for your specific use case!

P.S.: RE: Your other question from Slack. I'm happy to teach you how to derive the distribution of X/Y when X\~P and Y\~Q analytically if you don't know already. It's not usually a straightforward distribution.

briochemc commented 4 years ago

I'm totally fine with making this into my own personal code, but I'm not convinced that this is not a feature for Distributions.jl ([edited because of post below]). To be precise, I'd say the suggestion I made is not about "manipulating random variables" but it's about "manipulating distributions", and I offer 2 counter-arguments:

First, I fail to see any reason why the feature was added for MvNormal but not for Normal. And I think this is why the argument for genericity falls apart when there is such special treatment for MvNormal.

Second, since the ScaleLocation machinery is already here, and considering the size of Distributions.jl, I don't see the small-package argument carrying much weight either against then expanding the feature from Normal to any UnivariateContinuousDistribution... at the cost of just about ~10 lines of code. I say only ~10 because there are ~7 lines for the Base operations on UnivariateContinuousDistribution, and then the second set of lines of code for composing LocationScales should probably be converted into a single line specializing LocationScale on LocationScales, e.g., with something like

LocationScale(μ, σ, d::LocationScale) = LocationScale(μ + d.μ * d.σ, σ * d.σ, d.ρ)

while the Normal case can probably also be made to work with a single line like

LocationScale(μ, σ, d::Normal) = Normal(μ + d.μ, σ^2 * d.σ)

Of course, I probably made a mistake somewhere and some of these operations are not needed to match the current MvNormal precedent (like the - and / ones), but I think this is mostly it right there.

P.S.: I actually don't need the distribution of X/Y, but thanks!

briochemc commented 4 years ago

Sorry, I actually missed the most important argument here: Implementing this in a third package would be type piracy! So I have to insist, if this functionality is a good idea, it has to be implemented here.

JHonaker commented 4 years ago

I mean I have no authority over the package at all. I'm just a guy that uses Distributions.jl a lot. I just thought it'd be interested to have the discussion.

First, I fail to see any reason why the feature was added for MvNormal but not for Normal. And I think this is why the argument for genericity falls apart when there is such special treatment for MvNormal.

Perfectly sound reasoning. That inconsistency is weird, but it's probably just because someone needed it and wrote it.

Second, since the ScaleLocation machinery is already here, and considering the size of Distributions.jl, ...

Yea, I agree with you. Again, I'm just a random internet person that's not affiliated with Distributions.jl. My initial reservation was because of a misunderstanding about some Calculus. You should code it up and submit a pull request. You've already done most of the work while talking with me. :wink:

Sorry, I actually missed the most important argument here: Implementing this in a third package would be type piracy! So I have to insist, if this functionality is a good idea, it has to be implemented here.

As far as Type Piracy goes, I'd say this falls pretty squarely in the extending functionality reasoning that some people do it that the last paragraph of that link talks about. Also, that's why in my example I recommended wrapping it in a struct of your own to define the behavior.

Also, Why do you keep :-1: ing me, :cry:?

johnczito commented 4 years ago

Stepping back for a sec, if X is lognormal, then this won't be true: log(X + λ) ~ N(μ, σ). But in any case, vanilla LogNormal and LocationScale exactly implement the feature originally requested, right?

JHonaker commented 4 years ago

Correct. The easy closed form answer is only really a property of Normal. I think the feature he was requesting is the ability to take a LocationScale distributions and add or multiply to get a different LocationScale.

x = LocationScale(0, 1, LogNormal(0, 1))
x + 3 == LocationScale(3, 1, LogNormal(0, 1))

This is actually not implemented.

briochemc commented 4 years ago

Stepping back for a sec, if X is lognormal, then this won't be true: log(X + λ) ~ N(μ, σ).

Isn't that the definition of lognormal, with λ=0?

briochemc commented 4 years ago

As far as Type Piracy goes, I'd say this falls pretty squarely in the extending functionality reasoning that some people do it that the last paragraph of that link talks about. Also, that's why in my example I recommended wrapping it in a struct of your own to define the behavior.

I don't think it falls squarely precisely because the functionality already exists here for MvNormal!

Also, Why do you keep :-1: ing me, :cry:?

Sorry, I try to use them to convey my opinion on the arguments, not to 👎 you personally! On the contrary, BTW, I'm very thankful for you commenting and discussing!

johnczito commented 4 years ago

Isn't that the definition of lognormal, with λ=0?

Sure, but for any other value of λ, the log of a shifted log normal isn't normal.

using Distributions, Gadfly
λ = 1
d = LogNormal(0, 1)
X = rand(d, 2500)
plot(x = log.(X .+ λ), Geom.histogram)
Screen Shot 2020-11-09 at 4 44 07 PM

Anyhow, if there's agreement that LogNormal + LocationScale does the trick, then we can close this.

briochemc commented 4 years ago

I don't think there was any confusion about that. The important bit here is that LocationScale applied to a shifted log normal would return a shifted log normal. So that is actually an additional argument to implement a ShiftedLogNormal type on which LocationScale is specialized IMHO.

briochemc commented 4 years ago

Apologies for expanding this thread further, but looking at the code in more detail, all the LocationScale tests are using ρ::Normal, so this could probably be improved by checking that LocationScale does what we expect for other, non-Normal distributions.

johnczito commented 4 years ago

Yeah, this is pretty discursive now. I'll close just because the "headline" feature is provided straightforwardly enough. Talk of generic transformations can continue on the existing issues about that or on Discourse, where I think it has also come up. Certainly, doing this in the near-term makes a lot of sense to me:

LocationScale(μ, σ, d::LocationScale) = LocationScale(μ + d.μ * d.σ, σ * d.σ, d.ρ)

Extending the LocationScale unit tests to other distributions that are closed under location-scale transformations like Laplace and Logistic would be great. PRs welcome.