New MultivariateNormal design

lindahua commented 10 years ago

What PDMats.jl currently implements is actually the routines needed to perform computation over zero mean Gaussian distributions.

I am considering migrate the codes in PDMats.jl into this package, and turn those three positive definite matrix types into three different kinds of zero-mean multivariate normal distributions:

abstract AbstractMvNormal <: ContinuousMultivariateDistribution
abstract AbstractZeroMeanMvNormal <: AbstractMvNormal

type ZeroMeanIsotropicMvNormal <: AbstractZeroMeanMvNormal
type ZeroMeanDiagonalMvNormal <: AbstractZeroMeanMvNormal
type ZeroMeanMvNormal <: AbstractZeroMeanMvNormal

Then a full multivariate normal distribution as follows:

type GenericMvNormal{ZMNormal} <: AbstractMvNormal

typealias IsotropicMvNormal GenericMvNormal{ZeroMeanIsotropicMvNormal} 
typealias DiagonalMvNormal GenericMvNormal{ZeroMeanDiagonalMvNormal} 
typealias MvNormal GenericMvNormal{ZeroMeanMvNormal}

In this way, most implementation can be done over GenericMvNormal.

This design makes sense to me, except that the names sound a little bit long. However, this issue can be partly alleviated using a convenient constructing function

mvnormal(Number) --> ZeroMeanIsotropicMvNormal
mvnormal(Diagonal) --> ZeroMeanDiagonalMvNormal
mvnormal(Matrix) --> ZeroMeanMvNormal

mvnormal(Vector, Number) --> IsotropicMvNormal
mvnormal(Vector, Diagonal) --> DiagonalMvNormal
mvnormal(Vector, Matrix) --> MvNormal

Generally, end users won't need to key in those type names very often.

Thoughts?

lindahua commented 10 years ago

This refactoring not only achieve the goal of #231, but also eliminate the dependency on PDMats.jl.

simonbyrne commented 10 years ago

I'v been thinking along similar lines. One of the things I've noticed is that the mat field containing the Matrix of a PDMat is never actually used by any of the methods (other than cov) and so an alternative option would be to simply make ZeroMeanMvNormal a parametric type that could accept the factorization

immutable ZeroMeanMvNormal{T <: Union(Matrix,Factorization)} <: AbstractMvNormal
   cov::T
end

This would require fewer types, and make it much easier to use alternative Factorizations, (CholeskyPivoted, Eigen, LDLt, etc), or special Matrix structures (Diagonal, SymTridiagonal) as well as define custom structures for things like Gaussian graphical models and Gaussian processes. The IsotropicNormal functionality could be provided by a new IsoDiagonal type.

As there should be essentially no overhead in creating this structure form a given cov, we could also define the GenericMvNormal{T} methods by simply calling ZeroMeanMvNormal, e.g.

pdf(d::GenericMvNormal, x) = pdf(ZeroMeanMvNormal(d.cov), x - d.mean)

lindahua commented 10 years ago

The computation is vastly different when the covariance is of different forms (scalar, diagonal, and full matrix). The computation is used in:

https://github.com/JuliaStats/Distributions.jl/blob/master/src/multivariate/mvnormal.jl#L75 https://github.com/JuliaStats/Distributions.jl/blob/master/src/multivariate/mvnormal.jl#L84 https://github.com/JuliaStats/Distributions.jl/blob/master/src/multivariate/mvnormal.jl#L92 https://github.com/JuliaStats/Distributions.jl/blob/master/src/multivariate/mvnormal.jl#L124 https://github.com/JuliaStats/Distributions.jl/blob/master/src/multivariate/mvnormal.jl#L174

and many other lines. They call methods like invquad, unwhiten, logdet etc, of which the computational details are encapsulated in PDMats.jl. These codes invoke methods on d.Σ, instead of working on its mat or value field directly.

In my original design, I require all sub types of AbstractPDMats to implement methods outlined in https://github.com/JuliaStats/PDMats.jl

That package also already provides some matrix types of special structures (e.g. ScalMat and PDiagMat). Since they require a different set of methods than AbstractMatrix, I chose not to mix them with the array hierarchy in Julia base, and introduce another hierarchy under AbstractPDMat.

Back to this issue, we have two choices:

We simply merge PDMats into Distributions, and have

immutable ZeroMeanMvNormal{C<:AbstractPDMat}
   cov::C
end

We get rid of AbstractPDMat altogether and directly implement those methods on sub types of AbstractZeroMeanMvNormal (like I outline above).

Personally, I don't have strong opinion of one way or the other.

simonbyrne commented 10 years ago

@lindahua I meant indirectly as well: the code for invquad, unwhiten, logdet etc. all just use the Σ.chol field, and so could easily dispatch directly on Cholesky (quad uses the mat field, but this could easily be changed to a Triangular * Vector; sumsq, which would actually give a lower flop count).

Part of my reasoning is that there are certain cases where you end up working with the factorisation anyway, and you never actually need to compute the full matrix: e.g. in MCMC you typically parametrise the covariance in terms of its Cholesky (or some other representation), and the standard Wishart and inverse Wishart samplers give you a Cholesky (it's on my todo list to add this to our rand methods).

Anyway, since PDMat <: AbstractMatrix, defining

immutable ZeroMeanMvNormal{T <: Union(Matrix,Factorization)} <: AbstractMvNormal
   cov::T
end

would not preclude the use of the PDMat type, and this would seem sensible as a default when constructred with mvnormal(X::Matrix).

lindahua commented 10 years ago

@simonbyrne I see your point.

Here, I would like to explain the rationales why I introduced AbstractPDMat originally:

A matrix-like type needs to provide a certain set of operations in order to work as a covariance (or inverse-covariance) for a Gaussian model. Not every subtype of AbstractMatrix or Factorization is qualified for this purpose.
In the case where we use a full matrix as the covariance, we need to maintain both the matrix itself and the Cholesky factorization (for efficient evaluation & sampling). Usually, when a user construct a Gaussian distribution, he provides a covariance matrix, and he would expect that we return the covariance matrix immediately when he calls cov(d), instead of waiting for a procedure to recompute that matrix from the Cholesky. This motivates the design of PDMat that maintains both.

simonbyrne commented 10 years ago

True, but couldn't we just enforce this at the outer constructor level? i.e. We only define ZeroMeanMvNormal methods for matrices/factorizations where the construction makes sense (e.g. Diagonal, Tridiagonal, Cholesky, Eigen{Real,Real}, PDMat), and leave users to define their own if they want to extend it further.

As I said above, I would be happy for PDMat to be the default option, I just want to have the choice.

And to answer your original question, I would be in favour of the first option (as you've probably guessed by now).

lindahua commented 10 years ago

Sounds good to me.

simonbyrne commented 10 years ago

@lindahua I'm just having a quick look through the current code: is there a reason why GenericMvNormalKnownSigmaStats store invΣ instead of Σ?

lindahua commented 10 years ago

A similar design was implemented in #296.

JuliaStats / Distributions.jl

New MultivariateNormal design #239