Open ararslan opened 7 years ago
Do they need to be in separate packages though or would submodules suffice for now?
Is Pkg set up to fetch submodules when updating?
Sorry, I meant a julia submodule, just to help organizing things better before making a bunch of repos. I imagine once they're sufficiently isolated we'll be able to make Distributions.jl a wrapper package that re-exports things from the other APIs (e.g., AbstractDistributions.jl, UnivariateDistributions.jl, MultivariateDistributions.jl)
At least for the Gaussian distribution, I would like to not separate the one and the n
-dimensional case into different objects and hard-code various implementation details (isotropic or not).
I'd rather give μ::T
and σ
or σσ'
and provide implementations which regardless whether μ
is a number or a vector or a fixed vector - as long as norm(σ\(x-μ))^2
is meaningful - dispatch to do the right thing. Compare with https://github.com/mschauer/Bridge.jl/blob/master/src/gaussian.jl
@mschauer The mathematician in me is somewhat sympathetic to the idea, but not sure how feasible and practical it actually is. For examplecdf
is only really applicable to univariate distributions (yes, it can be defined for multivariate distributions, but it is typically too difficult to compute outside of simple cases).
Can you expand on your use case? I'm not sure what your example is trying to show.
I guess that we could just overload cdf
for the cases that can be supported?
My code to simulate linear stochastic processes is indifferent about the underlying type say as long as randn(T)
gives a Gaussian increment.
In general, if I can write randn(T)
and σ*randn(T) + μ
, I would like Normal(zero(T))
and Normal(μ, σ*σ')
being able to reflect the known properties the corresponding distribution. This is not a case where lets say generality contradicts usefulness.
Example:
using Bridge, StaticArrays
Ts = [Float64, Complex{Float64}, SVector{2,Float64}]
Ps = [Wiener{T}() for T in Ts]
Xs = [sample([0.0, 0.5, 1.0], P)[end] for P in Ps]
gives
1.0 => -0.36573
1.0 => -0.17426-0.234315im
1.0 => [1.07265, -0.13575]
Suddenly in https://github.com/mschauer/Bridge.jl/blob/master/src/linpro.jl#L32 I cannot rely on Distributions to give the law of the increment.
Thanks, that was useful, though I'm not sure it is really feasible in the current structure, e.g. we would lose the property that Normal <: UnivariateDistribution
.
What would be the downside of simply dispatching on the appropriate distribution type, e.g. Normal
, MvNormal
, ComplexNormal
, MatrixNormal
(the latter 2 don't exist yet, but could in future).
Actually ComplexNormal
and MatrixNormal
are good counter-examples: both are actually 3-parameter distributions: ComplexNormal
has a mean, covariance and a pseudo-covariance, and MatrixNormal
has a mean and left and right covariance matrices.
One would really focus on the case σ*randn(T) + μ
, which for complex values makes the mixing pseudo-covariance zero.
I find it practical to have both in the same package, is this still an open discussion?
Since nothing has happened, it should be fine to continue the discussion. While it is practical, it comes at a significant price.
Static arrays can be supported to a large extend without depending on the StaticArray package I suppose.
Yes using the right abstractions could let us avoid depending on anything heavy. Lots of PRs lately have been abound correct parametrization of types to increase interop. Maybe we could do something similar for PDMats
?
One pain point on this is that the PDMats
package defines the interface for PD matrices, but also the heavy Arpack-dependent implementation
The PDMats dependence could be replaced with an AbstractPDMats
, at the cost of users of MvNormals providing actual factorizations for the covariance matrices themselves, which are then required to define unwhiten
etc, as in https://github.com/mschauer/GaussianDistributions.jl (I am sorry for linking this repo so often in our discussions, honestly it is mostly a sketch.)
Agreed, couldn't AbstracPDMats
be defined in a separate package from the implementation? Or even better, moved to LinearAlgebra
(not sure how possible this would be). Removing a heavy and not necessary dependency would greatly lighten Distributions
Actually, I don't think we need a special type for this. PDness is a property of the values anyway in the same way that σ>0
is value-dependent for Normal
. I think we could just have a multivariate normal type that is parameterized on Cholesky-like factor S
which could be either a matrix or factorization as long as S'S=Σ
.
Yes, one has to be a bit careful, because one would either like to give S
or a factorisation object for Σ
, which of course informs about S
, but is not S
.
Maybe #823 can help us here.
to dispatch on covar_mat=...
vs factorization = ...
?
No, as a way to specify that you pass in a covariance matrix instead of the "square root" of the covariance.
I had planned to tackle this after #823: my general plan was to get rid of PDMats and just work with factorizations (or Diagonal
s), and make use of FillArrays.jl, but i'm pretty short on time at the moment...
Revisiting this issue, reflecting on the experience with LogExpFunctions.jl. A year after the suggestion (https://github.com/JuliaStats/StatsFuns.jl/issues/46) to move it out of StatsFuns.jl 4 years ago, the code was transferred into a separate package.
Despite its small size, the package got quite a few PRs. For the authors, a smaller package is easier to contribute to. For the maintainers (especially @devmotion, who did a lot of work), a smaller codebase allows PRs to get reviewed and merged relatively quickly. I think this is the key factor; dependencies are kind of secondary, but of course it is always nice to have fewer of them.
Practically, I would recommend addressing #1139 first, then splitting into uni- and multivariate distributions.
Discussed elsewhere, opening an issue to track it.
Since they have a slightly different interface and could benefit from different optimizations, @andreasnoack has suggested we separate multivariate distributions (and probably mixture models as well) into a separate package and keep Distributions.jl for univariate distributions.