JuliaStats / Distributions.jl

A Julia package for probability distributions and associated functions.
Other
1.1k stars 411 forks source link

Amplify number system of Distributions to cope with arbitrary Number subtypes #423

Closed papamarkou closed 8 years ago

papamarkou commented 8 years ago

I am opening this issue as a follow up of a skype discussion that @mlubin, @jrevels and I had last Friday. One of the topics of our discussion has been how to integrate ForwardDiff with Distributions. This would allow to use ForwardDiff with a broad number of statistical problems.

To elaborate more, consider the simple example of computing the derivative of the log-pdf of a distribution. I deliberately keep the context simple at the moment so as to demonstrate the major issue. Once this is resolved, then more realistic statistical examples can be constructed.

So, consider

using Distributions

gradlogpdf(Normal(1., 0.15), 2.5)

It returns -66.66666666666667.

Now, let's say we would like to derive the gradient of the same log-pdf using automatic differentiation via ForwardDiff. The code would become

using Distributions
using ForwardDiff

f(x::Number) = logpdf(Normal(1., 0.15), x)
derivative(f, 2.5)

This does not work as intended, but instead returns the error

ERROR: MethodError: `logpdf` has no metho1 matching logpdf(::Distributions.Normal, ::ForwardDiff.GradientNumber{1,Float64,Tuple{Float64}})
Closest candidates are:
  logpdf(::Distributions.Normal, ::Float64)
  logpdf(::Distributions.Distribution{Distributions.Univariate,Distributions.Continuous}, ::Float64)
  logpdf(::Distributions.Distribution{Distributions.Univariate,Distributions.Continuous}, ::Real)
  ...
 in f at none:1
 in derivative at /Users/theodore/.julia/v0.4/ForwardDiff/src/api/derivative.jl:24

2 things went wrong here. ForwardDiff uses its own number types, which are all subtypes of Number. On the other hand, Distributions uses Float64, both as input arguments to methods such as logpdf() and as field types inside distribution definitions. To alleviate the problem, the following definition for Normal works:

using ForwardDiff

immutable Normal
    μ::Number
    σ::Number
end

logpdf(d::Normal, x::Number) = -0.5*(((x-d.μ)/d.σ)^2+log(2*π*(d.σ)^2))

f(x::Number) = logpdf(Normal(1., 0.15), x)
derivative(f, 2.5)

As anticipated, it returns -66.66666666666667. It becomes clear that 2 changes have been made; the fields μ and σ have been defined to be of type Number instead of Float64, and same goes for the second argument of logpdf().

In light of this example, would you agree to change a bit the structure of Distributions, specifically by using fields of type Number in distribution definitions and arguments of type Number in method signatures for methods like logpdf(), pdf() etc?

papamarkou commented 8 years ago

As I have been mulling over this matter, another thought struck me. Would an alternative solution be to define the number types in ForwardDiff as subtypes of Float64 (or of AbstractFloat) instead of Number? This would perhaps make more sense, given that derivative computations are meaningful only in relation to floating point numbers.

If we wanted to be perfect, normally we should somehow improve both on Distributions and ForwardDiff in my modest opinion; Continuous distributions should be parameterised to accept any floating point subtype of AbstractFloat and number types in ForwardDiff must be subtypes of AbstractFloat.

mlubin commented 8 years ago
immutable Normal
    μ::Number
    σ::Number
end

Fields shouldn't be abstract. This needs to be

immutable Normal{T<:Number}
    μ::T
    σ::T
end
lindahua commented 8 years ago

Unfortunately, in this definition, the field type is not a concrete type, which would have substantial impact on the performance:

immutable Normal
    μ::Number
    σ::Number
end

Another approach is to turn this into a parametric type:

immutable Normal{T<:Number}
    # ...
end

But then it would cause problems in downstream codes such as

type MyType
    # suddenly, distr. becomes non-concrete
    distr::Normal
end
lindahua commented 8 years ago

Another major hurdle is that many basic distributions (including Beta, Gamma, etc) are still relying on Rmath to work, and Rmath functions, as external C functions, only accept 64-bit floating point numbers.

mlubin commented 8 years ago

What about:

immutable GenericNormal{T<:Number} # or some other name
    μ::T
    σ::T
end
typealias Normal GenericNormal{Float64}

Another major hurdle is that many basic distributions (including Beta, Gamma, etc) are still relying on Rmath to work, and Rmath functions, as external C functions, only accept 64-bit floating point numbers.

If it doesn't work it doesn't work. But we can at least generalize the distribution types which might encourage further development of generic julia implementations.

papamarkou commented 8 years ago

I also prefer the parametric type definition of Normal, both in terms of design (and also because of the performance consideration I didn't take into account). In fact, that's the first way I coded the example but suppressed the parameter not to seem to overcomplicate things.

@lindahua I see your consideration - for one thing, the ambition is to get rid of the Rmath dependency in the future, and secondly the solution suggested by @mlubin above seems reasonable, do you think it may work?

One more thing, @mlubin, as far as the ForwardDiff side of things goes, in order to be strictly correct with types, I do believe that ForwardDiff-related number types should be subtypes of AbstractFloat.

mlubin commented 8 years ago

AbstractFloat is a subtype of Real, but ForwardDiff number types are not real numbers. I'm not sure how to reconcile that.

lindahua commented 8 years ago

I understand that it would be great if you have ForwardDiff and alike to work with Distributions seamlessly, which would be useful in MCMC and possibly some other applications.

At this point, however, the only type that you can generalize in this way is probably just Normal and Exponential distributions, and a couple of simple discrete distribution like Bernoulli. For everything else, it pretty much doesn't work.

The entire package is tied too close to Rmath and it doesn't seem like we have the capacity to get rid of them in near future.

mlubin commented 8 years ago

So let's start with Normal and Exponential then. Small steps...

lindahua commented 8 years ago

I don't mind to have a PR that replaces the implementation with a parametric one to see how things go.

papamarkou commented 8 years ago

I can try to work on this and discuss any issues as they arise in the PR. I think, to start with, I would just focus on the celebrated normal distribution as a driving example. Given workload, it may take a couple of weeks for the PR to land.

andreasnoack commented 8 years ago

@scidom I don't think it is necessary to parametrize the distribution types to achieve what you want. It is the methods that have to be less restrictive and I think we should be able to do that with pure Julia implementations for many of the basic distributions without sacrifying precision and speed (but please correct me if I'm wrong). E.g. logpdf(D::Normal, x::number) shouldn't cause trouble. I think the definition

mylogpdf(D::Normal, x::Number) = -(log(2*π)/2 + log(D.σ) + abs2((x - D.μ)/D.σ)/2)

should be okay. The constant should be stored as a constant, and maybe it is more precise to divide before the subtraction if the variance is large, but with this you can do

julia> derivative(t -> mylogpdf(X, t), 1.0)
-1.0

It might also be useful to parametrize the distributions to be able to store single precision or unum numbers, but I don't think it is a good idea to allow complex numbers. It will be difficult to handle correctly.

mlubin commented 8 years ago

@andreasnoack, one might want to compute derivatives with respect to the distribution parameters also

andreasnoack commented 8 years ago

Of course. That is probably the most important usecase. Then it's a bit more complicated.

papamarkou commented 8 years ago

For a moment I thought that we can get away with it @andreasnoack without having to parameterise distributions, but @mlubin is right, a common use case is to compute the gradient of the log-pdf with respect to the pdf parameters.

About the logpdf() implementation I provided above, I was aware it was too naive, just did it on top of my head to convey my point :)

lindahua commented 8 years ago

Actually, I think it might be less efforts to just explicit provide a gradient computation function for each distribution than attempting to turn the entire system parametric.

Each distribution already has a dozen or so methods, it won't be too much more work to just add one more.

mlubin commented 8 years ago

What would the code look like on the user side? On Oct 4, 2015 9:20 PM, "Dahua Lin" notifications@github.com wrote:

Actually, I think it might be less efforts to just explicit provide a gradient computation function for each distribution than attempting to turn the entire system parametric.

— Reply to this email directly or view it on GitHub https://github.com/JuliaStats/Distributions.jl/issues/423#issuecomment-145407906 .

johnmyleswhite commented 8 years ago

In the long run, I think it would be good to support parametric types. But it's a massive project. I liked the proposal to start by making all methods (including constructors) for Normal work with arbitrary Number types. Seeing how long that takes will give a good sense of how long everything else will take.

andreasnoack commented 8 years ago

...but does it make sense with a normal random variable with complex (or dual) mean and variance? A single complex number can't encode the variance of the complex normal distribution.

johnmyleswhite commented 8 years ago

That's beyond my knowledge. Complex numbers never come up in the work I do.

mlubin commented 8 years ago

Dual variance makes sense, it just carries along an infinitesimal component that will compute derivatives as the user wants. No idea about complex, don't see any reason to think about supporting this.

andreasnoack commented 8 years ago

@mlubin So is the suggestion to allow the construction a Normal{T<:Number}, but implicitly assuming that T<:Real?

mlubin commented 8 years ago

Well, dual numbers aren't reals. I don't know of a good way to allow reals and duals but not complex numbers through the type system. It might just fall on the user to "don't do that".

johnmyleswhite commented 8 years ago

Yeah, I think this is the same overly strict typing problems we've had in Base with linear algebra. Don't see any solution except a closed Union{Real, Dual} or waiting for us to have open unions.

mlubin commented 8 years ago

Distributions shouldn't have a dependency on DualNumbers or any other AD package. First of all we want this to work with ForwardDiff which defines its own number type separately from DualNumbers, and second I wouldn't want to give special treatment to one implementation over another, given that it's so easy to define your own dual type, and there are already multiple packages that do so.

jrevels commented 8 years ago

You can always disallow truly unsupported types with an inner constructor:

immutable Normal{T<:Number}
    μ::T
    σ::T
    Normal(μ, σ) = new(μ, σ)
    Normal(μ::Complex, σ::Complex) = error("This type does not support complex numbers.")
end

If it gets to be super common, you could probably automate defining these restriction definitions with a macro.

johnmyleswhite commented 8 years ago

I think it's ok to have a little duck-typing for the moment.

andreasnoack commented 8 years ago

@mlubin By "implicilty assuming", I mean that all methods will be written only with real numbers in mind.

@johnmyleswhite Which "...overly strict typing problems we've had in Base with linear algebra" are you thinking of?

Let's get a branch/pr with parametrized Normal and play with it. I agree, that it would be very cool to have automatic differentiation working for parameters of the distributions so let's try it out and see if it is the right way to get it.

papamarkou commented 8 years ago

Quick question, @johnmyleswhite, what does open union mean in Julia terms? Have only heard the term in topology :)

In the existing math universe distrbutions with complex fields are not defined, I think. I like the suggestion of @jrevels as a temporary fix.

So who is going to play with a "parametric" branch?

lindahua commented 8 years ago

Maybe just start working on a PR, and we may tackle issues that arise as we proceed.

papamarkou commented 8 years ago

Ok, will do this within the next few days.

papamarkou commented 8 years ago

Hi all, I made a first attempt today to make Normal work with ForwardDiff. To start with and in order to keep it simple avoiding big workload, as @lindahua and @johnmyleswhite pointed out, I made it work with logpdf(). This enables us, at least in the context of the ForwardDiff paper, to provide a statistical example. The change I did was to add only two lines in my fork of Distributions, see here.

@mlubin, @jrevels allow me to shortly explain a bit the context in which various kinds of AD may appear in relation to Distributions. We are looking at Monte Carlo sampling, which can be used in two different use cases.

One case is that of using Monte Carlo sampling as a random sampling scheme, i.e. as a way of sampling from a distribution. For example, say f(x)~N(mu, sigma^2), where mu and sigma are known. Then, using a Markov Chain Monte Carlo (MCMC) method you can generate samples {x_1,x_2,\dots,x_n} from the normal N(mu, sigma^2). The only conceptual difference from ordinary random sampling is that our samples will be correlated, but they would still be a sample from the desired normal, at least asymptotically. This is the use case we have addressed with the above change in Distributions, as it requires the derivative (with respect to x) of logf(x).

The second use case is to employ Monte Carlo sampling for Bayesian inference. So, let's say that x~N(mu, sigma^2), where sigma is known and mu is the unknown parameter we want to estimate. Then, we can assign a prior mu~N(mu_0, sigma_0^2) to the unknown parameter mu, assuming we know the hyper-parameters mu_0 and sigma_0. In this context, we can use MCMC sampling to draw samples from the posterior of f(mu | x, sigma). The mean of the simulated Monte Carlo chain gives an estimate for the unknown parameter mu. Since we sample from f(mu | x, sigma), we obviously want the derivative of this function with respect to mu. Now this is the case we haven't addressed yet, and it would require redefining the Normal type.

I think that at least for the needs of our paper, we can avoid providing an example of Bayesian inference and we can stick to the simpler first case of Monte Carlo sampling as a means of sampling from a distribution of interest. It would be more than enough as a statistical example, esp for an article in a software- or optimization-centric journal.

papamarkou commented 8 years ago

P.S. If you clone my fork of Distributions, to try the first scenario of simple Monte Carlo sampling from a normal, the following now works:

using Distributions
using ForwardDiff

f(x::Number) = logpdf(Normal(1., 0.15), x)
derivative(f, 2.5)
papamarkou commented 8 years ago

I just left this a couple of days open in case there were any comments - should I proceed and merge this small change? It will let us demonstrate how ForwardDiff can work with the normal distribution via an appropriate example (in the draft).

mlubin commented 8 years ago

@scidom, could you open a PR from your branch?

papamarkou commented 8 years ago

Yeap, will open the PR by tomorrow, just wanted to ask if you are happy with it before doing so.

johnmyleswhite commented 8 years ago

Hard to say without a PR. I really find the PR framework essential to doing code review.

papamarkou commented 8 years ago

Ok, I just opened #427, I will close the issue here and we can carry on the discussion in the PR. The example I provided above works after this little patch. Once these two lines are reviewed, I can possibly look at the multivariate normal too, which I it should be equally easy to make it work. This is all about logpdf() for the time being, I haven't touched Pandora's box, aka redefining the distribution types.