filtron commented 2 months ago

Currently it is defined as the following type parameter restriction:

const AffineNormalKernel{T} =
    NormalKernel{<:AbstractAffineMap{T},<:CovarianceParameter{T}} where {T}

However, since there is limited possibilities for a custom covariance type to opt into being a CovarianceParameter this means more methods need to be added for NormalKernel{<:AbstractAffineMap,<:MyCovarianceType} in order to get functionality such as marginalize and bayes_rule

However, this would not be necessary if we do the following:

abstract type AbstractAffineNormalKernel <: AbstractNormalKernel end

struct AffineNormalKernel{A,B} <: AbstractAffineNormalKernel 
affine_map::A
covariance::B
end

Now all operations on combinations of Dirac, DiracKernel, Normal, AffineNormalKernel "just work" as long as their supplied with vectors that behave as AbstractVector, affine maps that behave like AbstractAffineMap, and covariances that behave CovarianceParameter (in the sense that they implement the necessary methods).

filtron commented 2 months ago

Upon some further consideration I have come up with some alternative options. Now it is important to keep in mind that the problem is that we do not own a type for positive semi-definite matrices with the following properties:

Anyone can subtype it.
Already existing types are a sub-type of it (e.g. Cholesky)

A lot of methods are defined for AffineNormalKernel such as marginalize, invert, bayes_rule which is defined by the following type restriction

const AffineNormalKernel{T} =
    NormalKernel{<:AbstractAffineMap{T},<:CovarianceParameter{T}} where {T}

The consequence is that these methods dont exist for

NormalKernel{<:AbstractAffineMap,<:MyCovarianceType}

even when MyCovarianceType implements everything that is expected of a covariance parameter.

Option 1: `ConstantConditionalCovariance`

Add the following type (name could be something different)

struct ConstantConditionalCovariance{A}
covariance_parameter::A 
end 

(C::ConstantConditionalCovariance)(x) = C.covariance_parameter

so that AffineNormalKernel may be defined by

const AffineNormalKernel{A,B} = NormalKernel{A,B} where {A<:AbstractAffineMap,B<:ConstantConditionalCovariance}

Now anyone can construct an AffineNormalKernel by

NormalKernel(affine_map, ConstantConditionalCovariance(my_special_cov))

Option 2: `HomoskedasticNormalKernel`

Add the following type

struct HomoskedasticNormalKernel{A,B} <: AbstractNormalKernel
mean_parameter::A 
cov_parameter::B
end

so that AffineNormalKernel may be defined by

const AffineNormalkernel{A,B} = HomoskedasticNormalKernel{A,B} where {A<:AbstractAffineMap,B}

Now anyone can construct an AffineNormalKernel by

HomoskedasticNormalKernel(affine_map, my_special_cov)

Option 3: `AffineNormalKernel`

Add the following type

struct AffineNormalKernel{A,B} <: AbstractAffineNormalKernel 
affine_map::A
covariance::B
end

so that anyone can construct an AffineNormalKernel by

AffineNormalKernel(affine_map, my_special_cov)

Some consideration

Maybe what is currently AffineNormalKernel should be AffineHomoskedasticNormalKernel? It is certainly more informative but also very long.
HomoskedasticNormalKernel is a useful type in its own right for e.g. implementing cubature methods in downstream applications. This is naturally achieved by option 2 and can be accomplished by option 1 via the type alias HomoskedasticNormalKernel{A,B} = NormalKernel{A,B} where {A,B<:ConstantConditionalCovariance}. However, this is not automatically achieved by option 3.
Currently cov for kernels of constant conditional covariance is implemented by an anonymous function like so: cov(K::AffineNormalKernel) = x -> K.Σ. The wrapping approach of option 1 avoids this. This can of course be achieved by implementing this wrapper for option 2 as well. Is there any benefit to this?
Should there be a function normalkernel which can be used to send the inputs to the correct constructor? For example, if option 1 is taken then normalkernel(F::AbstractAffineMap, C::Cholesky) = NormalKernel(F, ConstantConditionalCovariance(C)), or for example if option 2 is taken then normalkernel(F::AbstractAffineMap, C::Cholesky) = HomoskedasticNormalkernel(F, C). In this case I guess similar constructors for other kernels/distributions should be defined as well, e.g. normal, dirac, dirackernel.
Option 3 has the "benefit" that you dont need to subtype AbstractAffineMap for the conditional mean parameter. However, as anyone can subtype AbstractAffineMap I do not see how this would solve a practical problem.

In view of 2. and 4. I think option 3 seems least favourable. I am currently favouring option 2 but cant really say anything against option 1 as of now. Furthermore, 4. seems to make sense? (at least it saves a couple of shift-presses...)

nathanaelbosch commented 2 months ago

A very simple alternative would be to not restrict the covariance part in AffineNormalKernel at all, and do

const AffineNormalKernel{T} = NormalKernel{<:AbstractAffineMap{T}} where {T}

But probably there is some issue with this that I'm not aware of?

Also I am wondering, in the current state the docs say that NormalKernel assumes the covariance part to be callable. Is this also the case for the current AffineNormalKernels?

Otherwise, I don't have a clear favorite. Maybe it makes sense to come up with some examples first to see how they would actually feel when using them?

filtron commented 2 months ago

A very simple alternative would be to not restrict the covariance part in AffineNormalKernel at all, and do
const AffineNormalKernel{T} = NormalKernel{<:AbstractAffineMap{T}} where {T}
But probably there is some issue with this that I'm not aware of?

The problem is that we need the possibility to express "Normal kernel with affine conditional mean and constant conditional covariance" as a type. This can of course not be done by just restricting the conditional mean type and we can not restrict the conditional covariance type because there is no commonly accepted abstract type that all covariances types can subtype (it also seems a bit impossible to achieve).

Also I am wondering, in the current state the docs say that NormalKernel assumes the covariance part to be callable. Is this also the case for the current AffineNormalKernels?

In general a normal kernel is of the form $\mathcal{N}(\mu(x), Q(x))$ so the idea is that mean and cov should return things that can evaluate $\mu$ and $Q$ at some point $x$. The internal representation of the covariance is then obtained by covp which does not need to be callable.

Otherwise, I don't have a clear favorite. Maybe it makes sense to come up with some examples first to see how they would actually feel when using them?

I agree.

filtron commented 2 months ago

Option 4:

Redefine CovarianceParameter to:

const CovarianceParameter{T} = Union{AbstractMatrix{T}, Factorization{T}, T, Base.RefVale{T}} where {T<:Number}

Base.RefVale is just there in case we want in-place operations on scalar distributions / kernels.
I guess most sensible covariance representations will now be subtypes of CovarianceParameter. The only exception I can think of is we want an scalar analogue of Cholesky (i.e. ScalarFactorization) in which case it might not make sense to subtype Factorization and it might be a lot of useless work to make sure it is actually a subtype of Number (we would only need a tiny fraction of the methods we're expected to implement).
Just need to make sure NormalKernel is constructible for custom types (i.e. no pre-processing that leads to stackoverflow and so on).
Another drawback is that people can construct normal distributions / kernels with nonsensical covariances but maybe that is on them?

This seems like the absolutely simplest solution and I can not think of any immediate drawbacks? (other than the tiny annoyance already mentioned).

Option 5:

Define:

abstract type PSDness end 
struct IsPSDParametrization end 
struct IsNotPSDParametrization end 

ispsdparametrization(A) = IsNotPSDParametrization()
ispsdparametrization(::Cholesky) =  IsPSDParametrization()
ispsdparametrization(::SelfAdjoint) =  IsPSDParametrization()
# etc...

Add type parameters to e.g. NormalKernel like so:

struct NormalKernel{TP<:PSDness, TM, TC} <: AbstractNormalKernel
mean::TM
covp::TC
end

Add aliases:

const HomoskedasticNormalKernel{TM,TC} = NormalKernel{<:IsPSDParametrization,TM,TC} where {TM,TC}
const AffineHomoskedasticNormalKernel{TM,TC} = NormalKernel{<:IsPSDParametrization,TM,TC} where {TM<:AbstractAffineMap,TC}

Any type can opt into being a psd parametrization by adding the method ispsdparametrization(::MyPSDType) = IsPSDParametrization()
marginalize, bayes_rule, invert, etc will "just work" for AffineHomoskedasticNormalKernel{TM,TC} as long as the type TC has all the methods expected of a psd parametrization.
ispsdparametrization(::SelfAdjoint) = IsPSDParametrization() is as always awkward as self adjoint matrices do not need to be positive semi-definite but then again, positive semi-definiteness for a general matrix can not be inferred from the type. So that problem always exists.
Names of types and functions is of course up for debate.

filtron / MarkovKernels.jl

Should ```AffineNormalKernel``` be its own type rather than a type parameter restriction? #106

Option 1: `ConstantConditionalCovariance`

Option 2: `HomoskedasticNormalKernel`

Option 3: `AffineNormalKernel`

Some consideration

Option 4:

Option 5:

filtron / MarkovKernels.jl

Should ```AffineNormalKernel``` be its own type rather than a type parameter restriction? #106

Option 1: ConstantConditionalCovariance

Option 2: HomoskedasticNormalKernel

Option 3: AffineNormalKernel

Some consideration

Option 4:

Option 5:

Option 1: `ConstantConditionalCovariance`

Option 2: `HomoskedasticNormalKernel`

Option 3: `AffineNormalKernel`