Interoperability with MeasureTheory

JuliaMath / DensityInterface.jl

Interface for mathematical/statistical densities in Julia

Other

12 stars 3 forks source link

Interoperability with MeasureTheory #5

Open cscherrer opened 2 years ago

cscherrer commented 2 years ago

For use with MeasureTheory, it would be helpful if we could write logfuncdensity(log_f, base), with expected behavior

logdensityof(logfuncdensity(log_f, base), x) == log_f(x) - logdensityof(base, x)

This would allow us to write

DensityInterface.logfuncdensity(log_f, base::Measure) = MeasureTheory.∫exp(log_f, base)

I know this package doesn't want base measures to be a requirement, so maybe we could find a way to make base optional.

∫exp is also defined for Distributions; it's not clear to me yet how to extend the interface to cover that case.

oschulz commented 2 years ago

Hm, do we want logfuncdensity to act an "alias" (for two arguments) if MeasureTheory (resp. MeasureBase) already provides ∫exp?

If the second argument will be a Measure, then MeasureBase seems to be the natural place to define the function (after all, it already does). Why not have different names for the two functions? The the case with the measure is more specific (which we want, if course, if a Measure is available) so maybe it's good if it has a less generic name.

It would be great if the return value of MeasureTheory.∫exp would support DensityInterface.hasdensity and DensityInterface.logdensityof, though. I think that would already give us interoperability.

cscherrer commented 2 years ago

The issue is that logfuncdensity(log_f) doesn't make sense at all for MeasureTheory, because a function alone isn't enough to specify a measure. I guess we could just not use that function, but I think a desirable quality of an interface is for it to make sense across packages.

oschulz commented 2 years ago

Yes, but shouldn't that function be called logfuncmeasure(log_f, measure) instead of logfuncdensity, since the return value is a measure, not a density (whereas logfuncdensity(log_f) will really return "just" a density)?

oschulz commented 2 years ago

logdensityof is flexible on purpose, regarding it's input (can be a density or something that has a base measure), to help with generic programming. But for construction of densities and measure, shouldn't we be more specific?

cscherrer commented 2 years ago

I thought it went

logdensityof(d) returns a (a density) function
logfuncdensity(f) takes a function, treats is as a density, and returns a measure (even if you don't call it that)

If not, then what does logfuncdensity do?

oschulz commented 2 years ago

In general logfuncdensity just returns a density that is defined by a log-function. Like a likelihood density, defined by a log-likelihood function.

Lets say we have a distribution d. logdensityof(d) will return a function that computes the log of the probability density. But given an arbitrary "black-box" function log_f that computes a the log of a probability density at given points, logdensityof can only create a density object (the probability density), not a distribution - not enough information.

Except if log_f was the result of calling logdensityof(d), which returns Base.Fix1(logdensityof, d), so the information is available.

I guess we should say logfuncdensity can return a measure if there's an implied base measure for log_f - but in most cases, it will just return a density.

cscherrer commented 2 years ago

I thought these were inverses, how can that be if they both return densities?

oschulz commented 2 years ago

How about this - after we add this to Distributions (JuliaStats/Distributions.jl#1416):

@inline DensityInterface.hasdensity(::Distribution) = true

DensityInterface.logdensityof(d::Distribution, x) = logpdf(d, x)

DensityInterface.densityof(d::Distribution, x) = pdf(d, x)

we add something like this to MeasureBase (caveat - I didn't test this, and some of it may already be there):

const MeasureLike = Union{MeasureBase.Measure, Distributions.Distribution}

# ParameterFreeMeasure means a Measure subtype that's singleton type:
abstract type ParameterFreeMeasure <: MeasureBase.Measure end

struct LebesgueMeasure <: MeasureBase.ParameterFreeMeasure end
struct CountingMeasure <: MeasureBase.ParameterFreeMeasure end

MeasureBase.logdensity(
    μ::Distributions.ContinuousDistribution, ::LebesgueMeasure, x
) = logpdf(μ, x)

MeasureBase.logdensity(
    μ::Distributions.DiscreteDistribution, ::CountingMeasure, x
) = logpdf(μ, x)

struct LogFuncRDIntegral{LogF,M<:MeasureLike} <: MeasureBase.Measure
    log_f::LogF
    μ::M
end

# Do whatever MeasureBase/MeasureTheory currently does here:
MeasureBase.logdensity(μ::LogFuncRDIntegral, ν::Measure, x) =
    MeasureBase.logdensity(MeasureBase.∫exp(μ.log_f, μ.μ), ν, x)

function logfuncmeasure end

MeasureBase.logfuncmeasure(log_f, μ::MeasureLike) =
    MeasureBase.LogFuncRDIntegral(log_f, μ)

MeasureBase.logfuncmeasure(
    log_f::Base.Fix1{
        Union{typeof(logdensityof),typeof(logpdf)},
        <:Distributions.ContinuousDistribution
    },
    ::LebesgueMeasure
) = log_f.x

MeasureBase.logfuncmeasure(
    log_f::Base.Fix1{
        Union{typeof(logdensityof),typeof(logpdf)},
        <:Distributions.DiscreteDistribution
    },
    ::CountingMeasure
) = log_f.x

function basemeasure end

MeasureBase.basemeasure(m::MeasureBase.LogFuncRDIntegral(m)) = m.μ

MeasureBase.basemeasure(m::Distributions.ContinuousDistribution) =
    MeasureBase.LebesgueMeasure()

MeasureBase.basemeasure(m::Distributions.DiscreteDistribution) =
    MeasureBase.CountingMeasure()

struct RadonNikodymDerivative{Mu,Nu}
    μ::Mu
    ν::Nu
end

DensityInterface.hasdensity(::RadonNikodymDerivative) = true

# If the μ is a singleton, we can do logfuncmeasure based on type:
MeasureBase.logfuncmeasure(
    log_f::Base.Fix1{typeof(logdensity),RadonNikodymDerivative{Mu,Nu}},
    μ::Nu
) where {Mu,Nu<:ParameterFreeMeasure} = log_f.x

function rd_derivative end

MeasureBase.rd_derivative(ν::MeasureLike, μ::MeasureLike) =
    MeasureBase.RadonNikodymDerivative(ν, μ)

# If the ν is a singleton, we can do rd_derivative based on type:
MeasureBase.rd_derivative(
    μ::LogFuncRDIntegral{LogF,M}, ν::M
) where {LogF,M<:ParameterFreeMeasure} = DensityInterface.logfuncdensity(ν.log_f)

DensityInterface.logdensityof(m::LogFuncRDIntegral, x) =
    m.log_f(x) + DensityInterface.logdensityof(m.μ, x)

DensityInterface.logdensityof(m::MeasureBase.RadonNikodymDerivative, x) =
    MeasureBase.logdensity(m.μ, m.ν, x) 
    # Maybe rename `MeasureBase.logdensity` to `MeasureBase.eval_log_rd_derivative` or so
    # to make the function names less similar?

DensityInterface.logdensityof(m::MeasureBase.Measure) =
    DensityInterface.logdensityof(
        MeasureBase.rd_derivative(m, MeasureBase.basemeasure(m))
    )

DensityInterface.logdensityof(m::MeasureBase.Measure, x) =
    DensityInterface.logdensityof(m)(x)

DensityInterface.logdensityof(
    m::RadonNikodymDerivative{<:Distributions.ContinuousDistribution,LebesgueMeasure},
    x...
) = logdensityof(m.μ, x...)

DensityInterface.logdensityof(
    m::RadonNikodymDerivative{<:Distributions.DiscreteDistribution,CountingMeasure},
    x...
) = logdensityof(m.μ, x...)

(I'm prefixing functions and types with package name a lot here for clarity).

Then this should work:

using DensityInterface, MeasureBase

μ = LebesgueMeasure() # Or another ParameterFreeMeasure compatible with log_f
m = logfuncmeasure(log_f, μ)

logdensityof(rd_derivative(m, μ), x) == log_f(x)
logdensityof(rd_derivative(m, μ)) === log_f

logdensityof(m, x) == log_f(x)
logdensityof(m) === log_f

It would the measure-theoretically explicit analog of (this would also work)

d = logfuncdensity(log_f)

logdensityof(d, x) == log_f(x)
logdensityof(d) === log_f

And this should also work:

d = SomeContinousDistribution()

log_f = logdensityof(rd_derivative(d, LebesgueMeasure())
logfuncmeasure(log_f, LebesgueMeasure()) === d

log_f = Base.Fix1(logpdf, d)
logfuncmeasure(log_f, LebesgueMeasure()) === d

It would be the measure-theoretically explicit analog of (this would also work)

log_f = logdensityof(d)
logfuncdensity(log_f) === d

log_f = logpdf(d)
logfuncdensity(log_f) === d

One could also think about making the measure theory API at least partially trait-based and replace MeasureLike by Any. Then I could define something like this in BAT:

MeasureBase.basemeasure(m::BAT.NonNormalizedPosterior) = m.prior

function MeasureBase.rd_derivative(μ::BAT.NonNormalizedPosterior{Li,Pr}, ν::Pr) where {Li,Pr}
    ν == ν.prior || throw(ArgumentError("ν must equal prior in rd_derivative of posterior"))
    ν.likelihood
end

(Note: The type is actually called BAT.PosteriorDensity, I use BAT.NonNormalizedPosterior here for clarity - though maybe I should rename it to that at some point).

Maybe similar things would be possible in Turing/PPL?

Sorry, I know this is a lengthy post. On the plus side, I think the code above may be quite complete already.

@cscherrer, what do you think of this?

@devmotion and @phipsgabler, how do you see this from a Distributions and Turing/PPL point of view?

cscherrer commented 2 years ago

Thanks for working through this @oschulz . I'll have a look and discuss with @mschauer, and get back to you soon.

oschulz commented 2 years ago

Thanks for working through this @oschulz . I'll have a look and discuss with @mschauer, and get back to you soon.

Thanks!

Regarding

[logdensityof/logfuncdensity] I thought these were inverses, how can that be if they both return densities?

logdensityof returns a function that computes the log-value of something that is or has a density. So it does not return a "density object", it returns a log-function (and logdensityof(logdensityof(d)) would not make sense).

logfuncdensity, on the other hand, takes a log-function (assumed to be the log-function of a density) and returns a corresponding "density object" (which is usually not a function, and even if it is, that function should certainly not compute the log-density value).

Actually, to make my proposal above work nicely, we should say that logfuncdensity returns a density or something that has a density (e.g. a measure). The latter will only happen in special(ized) cases, though (e.g. when it's input log-density function originated from a distribution or measure and still carries that information in a dispatchable way).

A major aim of DensityInterface is to let people work with "density objects" that do not have to be named log_... in the code to make clear if it's (already) a log or not:

It's so awkward to write log_likelihood = some_func; prior = SomeDistribution(). I had that in examples during the early days of BAT.jl and several users asked "why one with log_ and not the other?". On the other hand, likelihood = some_func; prior = SomeDistribution() is misleading, dangerous and wrong, if some_func returns the log of the likelihood.

But likelihood = logfuncdensity(some_func); prior = SomeDistribution() is clear to the reader. And it's safe, the code using likelihood and prior (and the posterior, and so on) can use logdensityof (resp. densityof) and be sure to get the log (resp. not the log).

oschulz commented 2 years ago

Maybe we actually should add something like

radon_nikodym_integral(d, μ) = logfuncmeasure(logdensityof(d), μ)

to my proposal above, to take advantage of the fact that things that implement the DensityInterface API (d in this case) don't need to have log in the name (on the LHS here).

To make things even nicer, we could replace the log_f-based struct LogFuncRDIntegral in my proposal with

struct RadonNikodymIntegral{F,M<:MeasureLike} <: MeasureBase.Measure
    d::D
    μ::M
end

requiring d to support logdensityof (can e.g. be the result of logfuncdensity(log_f)).

RadonNikodymIntegral would form an elegant symmetry with RadonNikodymDerivative, name-wise.

oschulz commented 2 years ago

I updated the proposal above a bit, renaming LogFuncMeasure to LogFuncRDIntegral and adding

DensityInterface.logdensityof(m::LogFuncRDIntegral, x) =
    m.log_f(x) + DensityInterface.logdensityof(m.μ, x)

Would indeed by nicer with a struct RadonNikodymIntegral, I think, then it could be

DensityInterface.logdensityof(m::RadonNikodymIntegral, x) =
    DensityInterface.logdensityof(m.d, x) + DensityInterface.logdensityof(m.μ, x)

cscherrer commented 2 years ago

Hi @oschulz , just going through this...

In a few places, you define a logdensity in terms of the three-argument method. This was how we started out, but it doesn't work well because

It forces the base measure to always be explicit. Especially for new users, it's much simpler if the base measure can be a separate function associated with a measure
This requires three-argument methods for all pairs of measures, which is intractable.

Instead we define

logdensity(::SomeMeasure, x)
basemeasure(::SomeMeasure)

The three-argument methods are then derived from this. If you take every possible measure and repeatedly apply basemeasure, you get a tree. The three-arugment logdensity is in terms of walking this tree. This approach allows us to limit ourselves to methods between "primitive" measures for definition of three-argument logdensity methods.

DensityInterface.logdensityof(
    m::RadonNikodymDerivative{<:Distributions.ContinuousDistribution,LebesgueMeasure},
    x...
) = logdensityof(m.μ, x...)

We can't just ignore the base measure here, that will lead to incorrect results. Maybe we can talk about the x... in a call, I don't understand what's intended here.

A ContinuousDistribution doesn't necessarily have Lebesgue as its base measure. For example, a Dirichlet in three dimensions is a two-dimensional manifold (with boundary), and is singular with respect to Lebesgue measure.

logdensityof returns a function that computes the log-value of something that is or has a density. So it does not return a "density object", it returns a log-function (and logdensityof(logdensityof(d)) would not make sense).

logfuncdensity, on the other hand, takes a log-function (assumed to be the log-function of a density) and returns a corresponding "density object" (which is usually not a function, and even if it is, that function should certainly not compute the log-density value).

Actually, to make my proposal above work nicely, we should say that logfuncdensity returns a density or something that has a density (e.g. a measure). The latter will only happen in special(ized) cases, though (e.g. when it's input log-density function originated from a distribution or measure and still carries that information in a dispatchable way).

We can't define a logfuncdensity like this without knowing a base measure to use for it. That's not just a MeasureTheory thing. In general, you either specify a base measure, or you sacrifice expressiveness (say, assuming everything is with respect to Lebesgue measure).

The function-like object passed to logfuncdensity might "know" the appropriate base measure. But it seems dangerous to not have that available as part of the interface.

A general observation... We've talked about the goal of expressing "function-like" objects that can be evaluated in either log space or linear space. But without a base measure, it's not really a density.

In MeasureTheory we have a Likelihood type that's a lot like this. It's not a measure, but it can "act on" one measure to yield a new one using pointwise multiplication, ⊙. So for a measure μ and Likelihood ℓ,

μ ⊙ ℓ

gives a new measure. If μ is considered as a prior, then μ ⊙ ℓ is a posterior. But everything still works if μ is Lebesgue or CountingMeasure. The effect is to take enrich ℓ, so you end up with a new measure.

oschulz commented 2 years ago

Thanks @cscherrer , that cleared things up a bit more for me. I think we're on to something that could fit together very naturally here.

Instead we define logdensity(::SomeMeasure, x) basemeasure(::SomeMeasure)

Oh, I thought the three argument methods were the primary implementation for new densities. Nice, I think that will simplify integrating MeasureTheory with DensityInterface.

We do need a good name for the three-argument function though, IMHO. If something like rd_derivative and the like seems to technical, how about reldensity (for "relative density" resp. "related density") or so? Would that be a correct term?

Question: Does MeasureTheory allow for measures that don't define basemeasure? Or is that not allowed, do they all have to? The latter would make things easier here, I think.

The three-argument methods are then derived from this. [...]

So the three-argument logdensity function does not get specialized by new measure types, right?

A ContinuousDistribution doesn't necessarily have Lebesgue as its base measure. For example, a Dirichlet

we should say that logfuncdensity returns a density or something that has a density (e.g. a measure) We can't define a logfuncdensity like this without knowing a base measure to use for it. That's not just a MeasureTheory thing.

You mean for the case where logfuncdensity returns a density, or for the case where it returns something that has a density (like for logfuncdensity(logdensityof(d::Distribution)))?

I assume it's just a problem in the latter case (returning something that has a density)? Since logfuncdensity returning something that is a density just means turning an existing log-density function into a log/non-log-independent "density object" without adding information or changing semantics?

The function-like object passed to logfuncdensity might "know" the appropriate base measure. But it seems dangerous to not have that available as part of the interface.

Yes, I was thinking about the same thing. For the case logfuncdensity(logdensityof(d::Distribution)) == d we could say that

logdensity_of_d = logdensityof(d::Distribution) "turns" d into a density via it's base measure (RD-derivative).
logfuncdensity(logdensity_of_d) "turns" logdensity_of_d into a measure via an inherent base measure, it remembers where it "came from" (RD-integral).

Would it be Ok to allow basemeasure to operate on both measures and (log-)density-functions? If so, I think that would solve this in a clean fashion. In this case, we'd have basemeasure(d) == basemeasure(logdensity_of_d).

Naturally, plain black-box log-density functions that don't support basemeasure would not get turned into measures by logfuncdensity, just into "density objects" (no change in what it is, just not fixed in term of log/non-log anymore).

This approach would also mean that for any object (e.g. a measure) m that supports logdensityof and basemeasure we could easily provide:

logfuncdensity(logdensityof(m)) == m
basemeasure(logdensityof(m)) == basemeasure(m)

The first of these DensityInterface already provides, by default, via

logfuncdensity(log_f::Base.Fix1{typeof(logdensityof)}) = log_f.x

The second one we can provide just as easily via

basemeasure(log_f::Base.Fix1{typeof(logdensityof)}) = basemeasure(log_f.x)

If it's Ok to say that density functions (can) have a base measure, that would consitute a strong argument for adding basemeasure to DensityInterface instead of a secondary package. It we did, would we also want/need a function hasbasemeasure, to complement hasdensity?

A general observation... We've talked about the goal of expressing "function-like" objects that can be evaluated in either log space or linear space. But without a base measure, it's not really a density.

You mean because it's not clear what space it lives on?

Does that mean I can't do μ ⊙ ℓ if ℓ has no base measure? Or would the base measure of ℓ just be inferred from the base measure of μ?

The latter is actually what BAT.jl does right now: The user can pass the likelihood as a plain log-function (wrapped in logfuncdensity), and the space it lives on (e.g. degrees of freedom) are inferred from the prior, if not specified for the likelihood.

In MeasureTheory we have a Likelihood type that's a lot like this. It's not a measure, but it can "act on" one measure to yield a new one using pointwise multiplication, ⊙. So for a measure μ and Likelihood ℓ,

Is Likelihood required to provide base measure information?

If we go this road: I think the name of logdensityof fit it's semantics. But I do realize that logfuncdensity is a bit of an unfortunate name for something that may return something that is a measure. But maybe we can come up with a better name for "something-that-has-a-density-derived-from-this-log-density-function". It's not widely used in the ecosystem yet (just in BAT.jl so far, I think) and I'd be open to renaming it.

Just to make sure I'm understanding things correctly - which of these statements could you subscribe to, from a MeasureTheory point of view (without cringing too much)?

1) Every measure ν has exactly one (primary) base measure μ. 2) Every measure ν has exactly one "natural" density (function) f = dν/dμ, via ν's (primary) base measure μ. 3) Every (useful) density (function) f stems from a Radon–Nikodym derivative f = dν/dμ of two measures. 4) Every (useful) density (function) f can be said to have exactly one (primary) base measure. The base measure of f = dν/dμ is μ. 5) A density (function) can only be evaluated in the "context" of a measure. For a density function f = dν/dμ this can be μ or any of μ's recursive base measures.

Oh, a question in that context: What is the base measure of the Lebesgue measure? Or are there "root" measures that don't have a base measure?

cscherrer commented 2 years ago

Hi @oschulz , I've added some more discussion in https://github.com/cscherrer/MeasureBase.jl/issues/28

The perspective of most of the last week has been "Here's this interface, now what about MeasureTheory?". Now that things have slowed down some, I can think through what this actually looks like from the opposite perspective ("Here's MeasureTheory, now how about this interface?"). So far what I'm seeing in thinking through this is computational convenience, but some potentially serious pedagogical concerns. These are described in that issue.

We do need a good name for the three-argument function though, IMHO. If something like rd_derivative and the like seems to technical, how about reldensity (for "relative density" resp. "related density") or so? Would that be a correct term?

This is MeasureBase.Density, which can be constructed directly or using 𝒹 or log𝒹.

Question: Does MeasureTheory allow for measures that don't define basemeasure? Or is that not allowed, do they all have to? The latter would make things easier here, I think.

No, there's always a base measure. In some cases (Lebesgue, CountingMeasure) the base measure might be that measure itself.

So the three-argument logdensity function does not get specialized by new measure types, right?

It can, but typically no. Most often, this is only defined between measures that are their own base measure.

You mean for the case where logfuncdensity returns a density, or for the case where it returns something that has a density (like for logfuncdensity(logdensityof(d::Distribution)))?

Having this function return two different things that are so conceptually different will cause us a lot of problems.

Would it be Ok to allow basemeasure to operate on both measures and (log-)density-functions? If so, I think that would solve this in a clean fashion. In this case, we'd have basemeasure(d) == basemeasure(logdensity_of_d).

Maybe? It would have to always return a measure, and there would have to be clear semantics. If (log-)density functions hold two measures, this could be pretty clear.

You mean because it's not clear what space it lives on?

The space, and the measure on that space.

Does that mean I can't do μ ⊙ ℓ if ℓ has no base measure? Or would the base measure of ℓ just be inferred from the base measure of μ?

No, as you pointed out this is the R-D integral. So we could also write it ∫exp(ℓ, μ). This is a measure with log-density ℓ and base measure μ.

Is Likelihood required to provide base measure information?

Abstractly, a likelihood is just a function of parameters. The Likelihood struct is made from a Density and an observation. The Density is made up of measures. But those measures are on a different space. So a Likelihood has no concept of a measure on the parameter space, that must be provided separately.

This is getting long, so I'll continue separately...

cscherrer commented 2 years ago

If we go this road: I think the name of logdensityof fit it's semantics.

Whether or how well it fits depends on the context. For MeasureTheory, we hope for users of MeasureTheory to gradually "peel the onion" just as they do for Julia itself. There's potentially a lot of danger in beginners/students starting with the "everything is a density" perspective, and later discovering that things are really completely different.

But I do realize that logfuncdensity is a bit of an unfortunate name for something that may return something that is a measure. But maybe we can come up with a better name for "something-that-has-a-density-derived-from-this-log-density-function". It's not widely used in the ecosystem yet (just in BAT.jl so far, I think) and I'd be open to renaming it.

This is out DensityMeasure, named this because it's a measure that's represented in terms of a density (and of course a base measure). For shorthand we use ∫exp.

Just to make sure I'm understanding things correctly - which of these statements could you subscribe to, from a MeasureTheory point of view (without cringing too much)?
1. Every measure ν has exactly one (primary) base measure μ.

Abstractly, no. But in terms of our implementation, yes.

2. Every measure ν has exactly one "natural" density (function) f = dν/dμ, via ν's (primary) base measure μ.

Same as above :)

3. Every (useful) density (function) f stems from a Radon–Nikodym derivative f = dν/dμ of two measures.

"Stems from" is too weak, "density" and "Radon-Nikodym derivative" are exactly the same thing.

4. Every (useful) density (function) f can be said to have exactly one (primary) base measure. The base measure of f = dν/dμ is μ.

Abstractly, a density function is just a function. In terms of out implementation, the Density struct does have a base field, representing the base measure that was used to build it. But mathematically, not much of the base measure really persists. It could be useful to ask where a density came from, but that's an implementation question.

5. A density (function) can only be evaluated in the "context" of a measure. For a density function f = dν/dμ this can be μ or any of μ's recursive base measures.

No, once you have a density it's just a function.

Oh, a question in that context: What is the base measure of the Lebesgue measure? Or are there "root" measures that don't have a base measure?

basemeasure(μ::Lebesgue) = μ

We'd call Lebesgue "primitive". It's funny you use the term "root measure" though, because we have a fix utility function and

rootmeasure(μ) = fix(basemeasure, μ)

oschulz commented 2 years ago

~~Thanks again, very helpful! I'll reply on cscherrer/MeasureBase.jl#28, so we don't have to switch back and forth between these two.~~ Actually, maybe better here, since it concerns the design of DensityInterface.

oschulz commented 2 years ago

there's always a base measure. In some cases (Lebesgue, CountingMeasure) the base measure might be that measure itself.

Nice, that makes things easier, I feel.

So the three-argument logdensity function does not get specialized by new measure types, right?

It can, but typically no.

Again, that should help us here a lot.

You mean for the case where logfuncdensity returns a density, or for the case where it returns something that has a density

Having this function return two different things that are so conceptually different will cause us a lot of problems.

Ok, so maybe logfuncdensity being the inverse of logdensityof is a bad idea in terms of MeasureTheory then. We could change that, though, so that logfuncdensity(logdensityof(d::Distribution)) does not return d anymore, but a density with ismeasure(logfuncdensity(logdensityof(d::Distribution))) == false.

We could add a new function that takes over the current behavior of logfuncdensity, with a name that makes it clear that it will return a density or something that has a density. MeasureTheory would stay far away from that function.

Would it be Ok to allow basemeasure to operate on both measures and (log-)density-functions?

Maybe? It would have to always return a measure, and there would have to be clear semantics.

Or maybe a different name, like origmeasure or so? I think it should actually operate on density objects (requiring ismeasure(d) == false), not log/non-log density functions.

Every measure ν has exactly one (primary) base measure μ.

Abstractly, no. But in terms of our implementation, yes.

Every measure ν has exactly one "natural" density (function) f = dν/dμ, via ν's (primary) base measure μ.

Same as above :)

Ok, so at least within the context of MeasureTheory, it's justified treat a measure as a density (i.e. transparently replace it with it's density) in certain situations for generic programming - though not in general. That's compatible with the current behavior of [log]densityof, I'd say.

Every (useful) density (function) f stems from a Radon–Nikodym derivative

"Stems from" is too weak, "density" and "Radon-Nikodym derivative" are exactly the same thing.

Ok, that's in the context of MeasureTheory, while DensityInterface also allows densities to be defined via a black-box [log]-function. But I don't think that's a problem, MeasureTheory doesn't have to support that just because DensityInterface allows it.

Every (useful) density (function) f can be said to have exactly one (primary) base measure. The base measure of f = dν/dμ is μ.

It could be useful to ask where a density came from, but that's an implementation question.

A density (function) can only be evaluated in the "context" of a measure.

No, once you have a density it's just a function.

Ok, so the current behavior of DensityInterface.logfuncdensity (may return a density or a measure) is not compatible with MeasureTheory. That's not a problem, it's current behavior is more of a "because we can" than really necessary - it can be changed, from my point of view.

basemeasure(μ::Lebesgue) = μ

Ahh, nice. I had a suspicion that might be the case. So we do have to defend against infinite recursion, but I assume MeasureTheory has the right tools for that in place, internally.

oschulz commented 2 years ago

How would this work out for MeasureBase/MeasureTheory:

DensityInterface

The motto of DensityInterface will not be "everything is a density" but "everything (compatible with it) has a density". This should be perfectly compatible with MeasureTheory: MeasureTheory already has a "default" density for every measure, via the measure's base measure.

DensityInterface adds (non-breaking change)

@inferred ismeasure(object) = false
@inferred hasdensity(object) = ismeasure(object)
@inferred isdensity(object) = hasdensity(object) && !ismeasure(object)

The owner of typeof(object) must specialize either hasdensity or ismeasure to return true. isdensity should not be specialized.)

DensityInterface also adds

function basemeasure(object) end

which is intended for measures (ismeasure(object) == true).

funcdensity is added. We define a clear contract

isdensity(logfuncdensity(f)) == true
isdensity(funcdensity(f)) == true

This mildly breaking, the current default behavior is logfuncdensity(logdensityof(object)) === object, but there are no active use cases that rely on this. And logfuncdensity will still be the inverse of logdensityof for densities - but not for measures (so also not for distributions anymore).

Distributions

Distributions replaces the current hasdensity(::Distribution) = true with

@inferred ismeasure(::Distribution) = true

Distributions removes @test logfuncdensity(logdensityof(d)) === d from it's tests. I think this can be considered a non-breaking change as far as Distributions is concerned: the current behavior stems from DensityInterface, which will reflect the slightly breaking change in it's version number, and people can't access this behavior without using DensityInterface (Distributions doesn't reexport these functions).

Distributions defines

basemeasure(::Distribution{Univariate,Continuous}) = Lebesgue()
basemeasure(::Distribution{Univariate,Discrete}) = CountingMeasure()

basemeasure(::Product) = ...
...

basemeasure(::MvNormal) = Lebesgue()
basemeasure(::Multinomial) = CountingMeasure()
basemeasure(::Dirichlet) = ...
...

Lebesgue and CountingMeasure could either be defined in DensityInterface or in a "MeasureUltralightBase" package that Distributions would be willing to depend on. Measure that are specific to a (family of) distribution(s) should probably be defined in Distributions itself.

MeasureBase

MeasureBase defines

struct Density{M,B}
    μ::M
    base::B
end

function DensityMeasure(d, base)
    ismeasure(μ) && ismeasure(base) || throw(ArgumentError("..."))
    DensityMeasure{typeof(μ),typeof(base)}(μ, base)
end

@inline DensityInterface.hasdensity(::Density) = true
@inline DensityInterface.ismeasure(::Density) = false  # Default anyway

# Can MeasureBase implement this on it's own, without MeasureTheory?:
logdensityof(d::Density, x) = ... 
densityof(d::Density, x) = ...

and

struct DensityMeasure{D,B}
    d::D
    base::B
end

function DensityMeasure(d, base)
    isdensity(d) && ismeasure(base) || throw(ArgumentError("..."))
    DensityMeasure{typeof(d),typeof(base)}(d, base)
end

@inline DensityInterface.hasdensity(::DensityMeasure) = true
@inline DensityInterface.ismeasure(::DensityMeasure) = true
@inline DensityInterface.basemeasure(m::DensityMeasure) = m.base

DensityInterface.logdensityof(m::DensityMeasure, x) = logdensityof(m.d, x)
DensityInterface.logdensityof(m::DensityMeasure) = logdensityof(m.d)
DensityInterface.densityof(m::DensityMeasure, x) = densityof(m.d, x)
DensityInterface.densityof(m::DensityMeasure) = densityof(m.d)

Ideally, there will be no AbstractMeasure supertype, so that Distributions can also be first-class measures.

cscherrer commented 2 years ago

Ok, so maybe logfuncdensity being the inverse of logdensityof is a bad idea in terms of MeasureTheory then. We could change that, though, so that logfuncdensity(logdensityof(d::Distribution)) does not return d anymore, but a density with ismeasure(logfuncdensity(logdensityof(d::Distribution))) == false.

This in itself is really encouraging. I've been kind of worried that having things not match up quite right might lead to misbehavior downstream.

Ok, that's in the context of MeasureTheory, while DensityInterface also allows densities to be defined via a black-box [log]-function. But I don't think that's a problem, MeasureTheory doesn't have to support that just because DensityInterface allows it.

I want to check that @mschauer agrees before committing to this, but I think for MeasureTheory "density" should be literally a density. We can always loosen this later if it makes sense for us, but overloading the term too much now will make it very hard to roll back later, and there are the pedagogical concerns we've discussed.

Ok, so the current behavior of DensityInterface.logfuncdensity (may return a density or a measure) is not compatible with MeasureTheory. That's not a problem, it's current behavior is more of a "because we can" than really necessary - it can be changed, from my point of view.

This would be great, I'd love to pin down the expected semantics more carefully.

cscherrer commented 2 years ago

Distributions replaces the current hasdensity(::Distribution) = true with
@inferred ismeasure(::Distribution) = true

:+1:

DensityInterface also adds
function basemeasure(object) end
which is intended for measures (ismeasure(object) == true).

I like this, the biggest concern is that basemeasure is correctly implemented. I doubt Distributions will want to have a system for pushforwards. But there aren't that many to consider, so maybe we could have some named special cases.

I need to get ready for some morning calls, more later...

oschulz commented 2 years ago

Ok, how about this: I create exploratory PRs on DensityInterface and Distributions with my proposal above, as a "scratchpad". Then we do at least have code in branches that are available to Pkg and that can be played with directly. Maybe you and @mschauer can do the same for MeasureBase? That way, we could try things out in practice.

cscherrer commented 2 years ago

Sounds good. Ours is here https://github.com/cscherrer/MeasureBase.jl/pull/27

oschulz commented 2 years ago

I doubt Distributions will want to have a system for pushforwards.

If I understand the term correctly, that's what I want to provide with VariateTransformations (should be ready soon, the code comes from BAT.jl, just needs to be refactored a bit more). So it wouldn't be in Distributions, but an add-on package, but maybe that would still work in terms of integrating with MeasureBase?

oschulz commented 2 years ago

Sounds good. Ours is here cscherrer/MeasureBase.jl#27

Nice, I'll get the DensityInterface and Distributions one set up, then. Will try to get this done tonight (have some calls as well).

oschulz commented 2 years ago

but I think for MeasureTheory "density" should be literally a density. We can always loosen this later if it makes sense for us, but overloading the term too much now will make it very hard to roll back later, and there are the pedagogical concerns we've discussed.

Quite a few of the use cases that motivated DensityInterface deal with (more or less) "black-box" densities, i.e. they are written in arbitrary code or stem from a non-measure-theory DSL. In a measure-theoretical sense, one could probably gives labels to the measures they are the RD-derivative of - but those measures are not reified in code, and even their names are not available. There's just many use cases where users "just want to code a likelihood" without adding information about measures.

Such "black-box" densities should of course be considered densities in the sense of DensityInterface, with isdensity(d) == true && ismeasure(d) == false (with the proposal above).

I don't think this would be a problem for MeasureBase: MeasureBase would have a concrete type Density for it's densities (a more user-friendly name for RadonNikodymDerivative). So MeasureBase can always tests if it's a density it can work with - or maybe, some things MeasureTheory can do can also work with "black-box" densities, just not all of them? In any case, MeasureBase could define a function is_rd_derivative (or similar name), maybe even functions get_nu and get_mu or so.

oschulz commented 2 years ago

@cscherrer woud you have a minimal example of what methods currently have to be implemented for a MeasureTheory measure? Maybe a multivaritate-normal probability measure?

cscherrer commented 2 years ago

I doubt Distributions will want to have a system for pushforwards.

If I understand the term correctly, that's what I want to provide with VariateTransformations (should be ready soon, the code comes from BAT.jl, just needs to be refactored a bit more). So it wouldn't be in Distributions, but an add-on package, but maybe that would still work in terms of integrating with MeasureBase?

Yes, but the concern here is that Distributions will need to be able to express each base measure, and in some cases this is singular with respect to Lebesgue measure.

Quite a few of the use cases that motivated DensityInterface deal with (more or less) "black-box" densities, i.e. they are written in arbitrary code or stem from a non-measure-theory DSL. In a measure-theoretical sense, one could probably gives labels to the measures they are the RD-derivative of - but those measures are not reified in code, and even their names are not available. There's just many use cases where users "just want to code a likelihood" without adding information about measures.

I think there are two things that can happen in these cases:

The density is implicitly with respect to Lebesgue measure
The density is implicitly with respect to a "Lebesgue-like" measure (probably Hausdorff measure) on its support

If everything in your world is case (1), you're safe. If everything is (2) and the base measure is fixed, you're probably also safe. But things can get really screwy when you try to mix (1) and (2), or different kinds of (2)s. From this perspective, being a little more rigorous about base measures gives you better composability.

@cscherrer woud you have a minimal example of what methods currently have to be implemented for a MeasureTheory measure? Maybe a multivaritate-normal probability measure?

This is a great idea, and something we should document anyway. I'll open a separate discussion for this.

oschulz commented 2 years ago

Yes, but the concern here is that Distributions will need to be able to express each base measure

I'm not a Distributions maintainer, but I would definitely support a proposal that Distributions should define their base measure.

oschulz commented 2 years ago

The density is implicitly with respect to Lebesgue measure [...] The density is implicitly with respect to Lebesgue measure

Yes, that true for a not small number of practical use cases - everybody happily "lives on" a Lebesgue measure, implicitly.

This is also the case for BAT.jl at the moment - though I would very much like to support discrete parameters in the future. But since there will always be a prior, the necessary information is always there. Basically, by combining a likelihood (density) with a prior, the user will implicitly state what space the likelihood lives on. This approach has worked out very well in practice, so far (i.e. users are happy with it :-) ).

MeasureBase/MeasureTheory will of course want to be more strict about things - but I think that can happen quite naturally.

mschauer commented 2 years ago

Distribution certainly has a corresponding notion encoded in the type parameters. We have to figure out what we need there: Translate Distribution's notion base measure one-to-one or let base measure return a more abstract summary notion of Volume, CountingMeasure which is shared more widely in the system (and we need to devise something the stranger ones, like Dirichlet). All that has influence in which package the objects returned by basemeasure have to live.

oschulz commented 2 years ago

Translate Distribution's notion base measure

Does Distributions currently have a notion of a base measure?

oschulz commented 2 years ago

All that has influence in which package the objects returned by basemeasure have to live.

How lightweight could the definition of primitive measures like Lebesgue and CountingMeasure be? Which methods would have to be implemented for them?

cscherrer commented 2 years ago

Does Distributions currently have a notion of a base measure?

Only implicitly, through VariateForm and ValueSupport. In some cases that's not enough, e.g. Dirichlet.

How lightweight could the definition of primitive measures like Lebesgue and CountingMeasure be? Which methods would have to be implemented for them?

If we assume Lebesgue is always on ℝⁿ, we could simplify our current implementation, and have

# Parameterized to allow e.g. Lebesgue(static(3))
struct Lebesgue{N}
    dimension: N
end

# Some subtleties here, e.g. safety of checking dimensionality, vs speed and flexibility (symbolic arrays, etc)
DensityInterface.logdensityof(::Lebesgue, x) = 0.0

basemeasure(d::Lebesgue) = d

There might be a little more needed, but I think not much.

Then at least in MeasureBase, we need to have some code for relative densities between CountingMeasure and Lebesgue.

Also, @mschauer suggested it might be more palatable to have a wider audience name like VolumeMeasure instead of Lebesgue. Exactly how this should be set up is important, but I think we can defer this detail for now.

oschulz commented 2 years ago

Ok, here goes: #9

@mschauer , could you add a PR on Distributions.jl to complement #9 and cscherrer/MeasureBase.jl#27 ? I'm not a Distributions maintainer and it would be easier to try things out via Pkg if all the PR's live on branches in the main repos of DensityInterface, Distributions and MeasureBase.

Content-wise, the Distributions PR can just remove @test logfuncdensity(logdensityof(d)) === d from the tests as a first step.

oschulz commented 2 years ago

DensityInterface.logdensityof(::Lebesgue, x) = 0.0

We probably shouldn't force Float64 though, it might propagate though a whole calculation that people may want to run on a (cheap) GPU without fast 64-bit floats. :-)

Lebesgue{N}

I don't think we should to that - there are many possible applications that might need to use Lebesgue for many (maybe millions) of different N, the compiler would explode. :-) It think it would have to be Lebesgue(N).

Let's continues this in #9 though, so we can represent the state of the discussion in evolving code?

oschulz commented 2 years ago

Quick update, @cscherrer, @mschauer and me had a brainstorming session regarding what the base measure of ∫(f, μ) should be for DensityInterface and MeasureTheory. I'm pretty sure we'll want it to be the base measure of μ for DenstiyInterface, or we'll surprise users, but in MeasureTheory it's currently μ itself. Not clear yet if that should be the long-term choice for MeasureTheory or not (μ may or may not be the better choice). We can deal with both cases, we'll just have to clearly document each behavior (the "get-me-the-log-density" functions would have different names if they use different implied base measures). We'll report back.