Closed briochemc closed 4 years ago
The gradient of a shifted distribution is just the gradient of the unshifted distribution evaluated at the point. So, you already have the gradient of any shifted distribution, and actually you have the gradient of any shifted or scaled pretty easily (as well as any function). This follows directly from the chain rule of Calculus:
So, if u(x) = ax + b
then:
This is not an issue, it is a feature request. I can do the maths, it's not what this is about... I have a complicated-ish pipeline which I feed Distribution
s to and which calls, e.g., logpdf
and gradlogpdf
on those, so I'm looking for a way to manipulate Distribution
s into other, transformed Distribution
s, such that logpdf
and gradlogpdf
are defined on them.
For example, you can manipulate MvNormal
s this way:
julia> n = 2
2
julia> μ = randn(n)
2-element Array{Float64,1}:
-2.1756248203084385
0.11893918507422431
julia> Σ = (A = randn(n,n); A'*A)
2×2 Array{Float64,2}:
0.520004 0.78019
0.78019 2.50689
julia> d = MvNormal(μ, Σ)
FullNormal(
dim: 2
μ: [-2.1756248203084385, 0.11893918507422431]
Σ: [0.520004363513731 0.7801904739218917; 0.7801904739218917 2.506893858621977]
)
julia> rand(n,n) * d + rand(n)
FullNormal(
dim: 2
μ: [-0.08744963479159473, -0.6322712809884787]
Σ: [2.8038858165606957 0.8426689566840974; 0.8426689566840974 0.27433469653672427]
)
(And as you might realize, it is also odd that this has been implemented for MvNormal
but not for univariate Normal
, but I personally don't need that myself at this stage.)
One thing that is quite close to what you want is LocationScale
in Distributions.jl
. It only works for ContinuousUnivariateDistribution
however.
Example:
julia> affinely_transformed_lognormal = LocationScale(1, 2, LogNormal(0.1))
LocationScale{Float64,LogNormal{Float64}}(
μ: 1.0
σ: 2.0
ρ: LogNormal{Float64}(μ=0.1, σ=1.0)
)
julia> gradlogpdf(shifted_lognormal, 4)
-0.46848836936938815
In particular, recall that if X ~ F
for a distribution F
, then aX+b~LocationScale(b, a, F)
.
Ah, I see. Sorry, if it came off a little harsh. I was just trying to help since it seemed like you were having trouble figuring out what the gradient would be (or so I thought from the talk of auto diff). For location scale transformations, you have LocationScale
like @nignatiadis says.
As for why you can do it with MvNormal, it’s because the parameters for the Normal distribution are exactly the mean and variance of the distribution. Normal distributions are a big exception for a lot of things. The sum of Normals is Normal(sum of means, sum of variances); it’s its own conjugate distribution; etc. It’s not as straightforward of a transformation on every parametric distribution.
You can easily write the gradient analytically if you have a closed form expression for your transformation and it’s gradient though.
You could make a struct that wraps a distribution,
struct DistOfU # I don’t know what the proper super type is off the top of my head.
base::Distribution
u::Function
univ::Function
dudx::Function
end
gradlogpdf(d::DistOfU, x) = gradlogpdf(d.base, u(x)) * dudx(x)
# other functions that just forward the calls to base for the things you need.
# e.g. rand(d::DistOfU, args...) = d.uinv(rand(d.base, args...))
I actually did something like this today for my own work that needs a lot more granularity of control than Distributions allows (and I don’t need to . The approach works really well. I used it for a lot of different things that needed to be non-standard or replaced.
It’s a really powerful idea for software in general.
Ah, I see. Sorry, if it came off a little harsh.
No worries, my bruised ego will recover in time 😄
OK great! Thanks to both! Yes, LocationScale
looks to be exactly what I need.
Back to the feature request, would it make sense, then, to add these
# for simple transforms of distributions, e.g., d + x, d - x, and so on
Base.:+(d::ContinuousUnivariateDistribution, x::Number) = LocationScale(x, one(x), d)
Base.:+(x::Number, d::ContinuousUnivariateDistribution) = d + x
Base.:*(x::Number, d::ContinuousUnivariateDistribution) = LocationScale(zero(x), x, d)
Base.:*(d::ContinuousUnivariateDistribution, x::Number) = x * d
Base.:-(d::ContinuousUnivariateDistribution, x::Number) = LocationScale(-x, one(x), d)
Base.:-(x::Number, d::ContinuousUnivariateDistribution) = LocationScale(x, -one(x), d)
Base.:/(d::ContinuousUnivariateDistribution, x::Number) = LocationScale(zero(x), inv(x), d)
# for composed transforms to simplify on the fly
Base.:+(d::LocationScale, x::Number) = LocationScale(d.μ + x, d.σ, d.ρ)
Base.:+(x::Number, d::LocationScale) = d + x
Base.:*(x::Number, d::LocationScale) = LocationScale(d.μ, d.σ * x, d.ρ)
Base.:*(d::LocationScale, x::Number) = x * d
Base.:-(d::LocationScale, x::Number) = LocationScale(d.μ - x, d.σ, d.ρ)
Base.:-(x::Number, d::LocationScale) = LocationScale(x - d.μ, -d.σ, d.ρ)
Base.:/(d::LocationScale, x::Number) = LocationScale(d.μ, d.σ / x, d.ρ)
This would allow for things like
42 * (2 * LogNormal(0.0, 1.0) - 1)
to work out of the box!
Of course, specialized version of these operations could be defined for Normal
, just like MvNormal
already has, since the type remains a Normal
.
Yea, I think that would be a perfectly reasonable thing to do for your project. I don't know if it's really in line with what Distributions.jl is trying to do though. It's a very generic distribution library, not a library for manipulation of random variables.
That's not to say that something like that wouldn't be useful to anyone. It's just a slight feature creep thing that might be better suited to another new package. I don't know how new to Julia you are, but small packages with a simple core idea are actually really common (and I'd argue preferred) in the Julia ecosystem. Of course we have huge monoliths like Plots.jl and Turing.jl just like Python has Matplotlib and PyMC.
Plus this really only works so simply for LocationScale and Normal distributions. You've already written most of the code you'd need for your specific use case!
P.S.: RE: Your other question from Slack. I'm happy to teach you how to derive the distribution of X/Y when X\~P and Y\~Q analytically if you don't know already. It's not usually a straightforward distribution.
I'm totally fine with making this into my own personal code, but I'm not convinced that this is not a feature for Distributions.jl ([edited because of post below]). To be precise, I'd say the suggestion I made is not about "manipulating random variables" but it's about "manipulating distributions", and I offer 2 counter-arguments:
First, I fail to see any reason why the feature was added for MvNormal
but not for Normal
. And I think this is why the argument for genericity falls apart when there is such special treatment for MvNormal
.
Second, since the ScaleLocation
machinery is already here, and considering the size of Distributions.jl, I don't see the small-package argument carrying much weight either against then expanding the feature from Normal
to any UnivariateContinuousDistribution
... at the cost of just about ~10 lines of code. I say only ~10 because there are ~7 lines for the Base
operations on UnivariateContinuousDistribution
, and then the second set of lines of code for composing LocationScales
should probably be converted into a single line specializing LocationScale
on LocationScale
s, e.g., with something like
LocationScale(μ, σ, d::LocationScale) = LocationScale(μ + d.μ * d.σ, σ * d.σ, d.ρ)
while the Normal
case can probably also be made to work with a single line like
LocationScale(μ, σ, d::Normal) = Normal(μ + d.μ, σ^2 * d.σ)
Of course, I probably made a mistake somewhere and some of these operations are not needed to match the current MvNormal
precedent (like the -
and /
ones), but I think this is mostly it right there.
P.S.: I actually don't need the distribution of X/Y, but thanks!
Sorry, I actually missed the most important argument here: Implementing this in a third package would be type piracy! So I have to insist, if this functionality is a good idea, it has to be implemented here.
I mean I have no authority over the package at all. I'm just a guy that uses Distributions.jl a lot. I just thought it'd be interested to have the discussion.
First, I fail to see any reason why the feature was added for MvNormal but not for Normal. And I think this is why the argument for genericity falls apart when there is such special treatment for MvNormal.
Perfectly sound reasoning. That inconsistency is weird, but it's probably just because someone needed it and wrote it.
Second, since the ScaleLocation machinery is already here, and considering the size of Distributions.jl, ...
Yea, I agree with you. Again, I'm just a random internet person that's not affiliated with Distributions.jl. My initial reservation was because of a misunderstanding about some Calculus. You should code it up and submit a pull request. You've already done most of the work while talking with me. :wink:
Sorry, I actually missed the most important argument here: Implementing this in a third package would be type piracy! So I have to insist, if this functionality is a good idea, it has to be implemented here.
As far as Type Piracy goes, I'd say this falls pretty squarely in the extending functionality reasoning that some people do it that the last paragraph of that link talks about. Also, that's why in my example I recommended wrapping it in a struct of your own to define the behavior.
Also, Why do you keep :-1: ing me, :cry:?
Stepping back for a sec, if X
is lognormal, then this won't be true: log(X + λ) ~ N(μ, σ)
. But in any case, vanilla LogNormal
and LocationScale
exactly implement the feature originally requested, right?
Correct. The easy closed form answer is only really a property of Normal. I think the feature he was requesting is the ability to take a LocationScale
distributions and add or multiply to get a different LocationScale
.
x = LocationScale(0, 1, LogNormal(0, 1))
x + 3 == LocationScale(3, 1, LogNormal(0, 1))
This is actually not implemented.
Stepping back for a sec, if
X
is lognormal, then this won't be true:log(X + λ) ~ N(μ, σ)
.
Isn't that the definition of lognormal, with λ=0?
As far as Type Piracy goes, I'd say this falls pretty squarely in the extending functionality reasoning that some people do it that the last paragraph of that link talks about. Also, that's why in my example I recommended wrapping it in a struct of your own to define the behavior.
I don't think it falls squarely precisely because the functionality already exists here for MvNormal!
Also, Why do you keep :-1: ing me, :cry:?
Sorry, I try to use them to convey my opinion on the arguments, not to 👎 you personally! On the contrary, BTW, I'm very thankful for you commenting and discussing!
Isn't that the definition of lognormal, with λ=0?
Sure, but for any other value of λ, the log of a shifted log normal isn't normal.
using Distributions, Gadfly
λ = 1
d = LogNormal(0, 1)
X = rand(d, 2500)
plot(x = log.(X .+ λ), Geom.histogram)
Anyhow, if there's agreement that LogNormal
+ LocationScale
does the trick, then we can close this.
I don't think there was any confusion about that. The important bit here is that LocationScale
applied to a shifted log normal would return a shifted log normal. So that is actually an additional argument to implement a ShiftedLogNormal
type on which LocationScale
is specialized IMHO.
Apologies for expanding this thread further, but looking at the code in more detail, all the LocationScale
tests are using ρ::Normal
, so this could probably be improved by checking that LocationScale does what we expect for other, non-Normal distributions.
Yeah, this is pretty discursive now. I'll close just because the "headline" feature is provided straightforwardly enough. Talk of generic transformations can continue on the existing issues about that or on Discourse, where I think it has also come up. Certainly, doing this in the near-term makes a lot of sense to me:
LocationScale(μ, σ, d::LocationScale) = LocationScale(μ + d.μ * d.σ, σ * d.σ, d.ρ)
Extending the LocationScale
unit tests to other distributions that are closed under location-scale transformations like Laplace
and Logistic
would be great. PRs welcome.
I would like to be able to get the
logpdf
andgradlogpdf
of a random variable X with a shifted log-normal distribution, i.e., such that log(X + λ) ~ N(μ,σ). Maybe adding a 3-argument LogNormal could work?I saw that there have been discussions about generic transformations (e.g., https://github.com/JuliaStats/Distributions.jl/issues/69), but it does not look like that came to fruition.
I also saw that some similar features have been added for, e.g., multivariate normal distributions (https://github.com/JuliaStats/Distributions.jl/pull/1051), but this was not implemented for other distributions, not even univariate normal distributions, AFAIK. (This seems like an oversight to me, but maybe I'm missing something.)
Finally, I guess that some external packages (maybe Bijectors.jl, DistributionsAD.jl, TransformVariables.jl or others), could help, but I have no idea how to use them in my specific use case.
I'm not a stats expert by any means, so I'm not sure this request makes sense 😅 but I have a concrete use case for it and I would love to be able to keep using just Distributions.jl for it. Also, please let me know if there is a workaround or a solution involving other packages!