JuliaMath / DensityInterface.jl

Interface for mathematical/statistical densities in Julia
Other
12 stars 3 forks source link

Future option: Express mass/normalization of densities #8

Open phipsgabler opened 2 years ago

phipsgabler commented 2 years ago

I've been previously thinking about whether and how a density interface should include the possibility to require or check for normalization, so I'm just posing this for discussion here.

Options that come to my mind:

# extra argument and dispatch
logdensityof(d, x; normalized=Val{false}())

Or maybe include a whole machinery for dealing with normalization constants?

# like eltype/length traits
hasnormalizingconstant(d) = false
normalizingconstant(d::SomeDensity) = ...
isnormalized(d) = normalizingconstant(d) == 1

# probably should be its own struct
normalize(d) = let logZ = log(normalizingconstant(d))
    logfuncdensity(x -> logdensityof(d, x) - logZ)
end
oschulz commented 2 years ago

Hm, I wonder if something like this could go into Distributions?

phipsgabler commented 2 years ago

The real use case of this IMHO is the query "is something a probability density". But that should then be available to PPLs too, since there we have all kinds of densities that we know are not normalized. So if this is deemed a good idea, I'd rather have it here.

oschulz commented 2 years ago

The real use case of this IMHO is the query "is something a probability density".

What do you think about my proposal at the (current) end of #5 regarding adding ismeasure and isdensity, as more specific queries than hasdensity? Would that serve this use case as well?

phipsgabler commented 2 years ago

I wouldn't think that makes a difference; being a measure doesn't tell you anything about normalization (or being a probability measure).

A third approach, where you only have to define normalizingconstant:

hasnormalizingconstant(d) = normalizingconstant(d) isa Nothing
normalizingconstant(d) = nothing
normalizingconstant(d::SomeDensity) = ...
isnormalized(d) = normalizingconstant(d) == 1
oschulz commented 2 years ago

We could call it logdensitynorm, and default it to logdensitynorm(d) == missing. Not sure we'd need a haslogdensitynorm, with a default of missing.

oschulz commented 2 years ago

Having a logdensitynorm could be useful in applications that need to operate in non-log space (e.g. subvolume integration) but need to renormalize (shift log-space) before doing exp(logdensityof(...)) because they would be out of the dynamic range of their float-type otherwise.

phipsgabler commented 2 years ago

I think I like that idea. It seemed to me that it would be nice also be able to express "infinite norms" (as there is a semantic difference between "finite but unknown constant", as in posteriors, and "known to be non-normalizable"). With your approach, it would work out, right?

logdensitynorm(d) = missing
logdensitynorm(d::Distribution) = 1
densitynorm(d) = exp(logdensitynorm(d))

will propagate the missing, as do subsequent computations. And then we can also have

densitynorm(d::Lebesgue) = Inf
# default: logdensitynorm(d::Lebesgue) = missing

with somewhat consistent behaviour.

So it seems nice on the first glance, but I'm not confident enough to judge downstream effects.

oschulz commented 2 years ago

Yes, I think so. But maybe the name logdensitynorm would be weird when used on measures/distributions - their norm is independent from which density is derived for them using different base measures. May we should just use LinearAlgebra.norm, and add lognorm on our own?

Then, for Bayesian inference, we'd have lognorm(∫exp(log_likelihood, prior)) == log_evidence.

oschulz commented 2 years ago

Hm, maybe using LinearAlgebra.norm is a bad idea, though, it's not specific enough. One might expect that norm(f_density) == sqrt(∫f_density(x)^2 dx), or something else. There are so many norms ...

phipsgabler commented 2 years ago

Yeah, I really wouldn't use LinearAlgebra.norm; for distributions, it's also already taken (https://github.com/JuliaStats/Distributions.jl/pull/1321). And for that reason, if something like this is implemented, we should probably refrain from "norm" as the name.

oschulz commented 2 years ago

What could we call it so that it would also make sense for densities (as the integral of the density function over it's support)?

oschulz commented 2 years ago

What is the norm of MeasureTheory.Dirichlet([2,3]), and can I compute it with `MeasureTheory?

cscherrer commented 2 years ago

"Norm" and "normalize" are very overloaded, which is why we went with mass and logmass cc @sethaxen

oschulz commented 2 years ago

I like those. I think defining those as abstract function in DensityInterface could be very handy. We often do use the term "norm of a density", we say the "PDF is normalized" and so on. So it's not just useful for MeasureBase, it's also useful for densities (density functions). We do of course mean the integral in respect to an implied base measure - but that's the same as using an implied base measure in `[log]densityof], so it would fit in quite neatly.

What do you think about this, @devmotion ?

cscherrer commented 2 years ago

Here's some more context: https://github.com/cscherrer/MeasureTheory.jl/issues/128

For a trait-based approach, I can see a few situations we'd want to distinguish:

  1. Measures that are normalized
  2. Measures with a known mass, which may or may not be one
  3. Measures known to be finite, with unknown mass
  4. Measures known to have infinite mass
  5. Measures where the mass is unknown (maybe finite, maybe infinite)
oschulz commented 2 years ago

Do we have a good package for static numbers that can represent infinity? Then logmass(measure) could deliver all of this - return either static zero (we could take that from Zeros.jl), a real number, FiniteMass() (not sure we need that one), a static Inf or missing.

All of those results could be dispatched upon, giving you that trait-based approach.

cscherrer commented 2 years ago

I think I'd go with Static.jl and Infinities.jl, so static(∞)

oschulz commented 2 years ago

Nice!

We wouldn't need these as dependencies for DensityInterface, even - we would just define

function logmass(object)
    if densitykind(object) <: IsOrHasDensity
        missing
    else
        throw(ArgumentArror("object is not and does not have a density")
    end
end

Implementations of densities and measures can then return a real number or a static something as fits the case.

cscherrer commented 2 years ago

When the user gives a function and says "treat this as a density", they need to be able to specify which of these cases we're in:

1. Measures that are normalized
2. Measures with a known mass, which may or may not be one
3. Measures known to be finite, with unknown mass
4. Measures known to have infinite mass
5. Measures where the mass is unknown (maybe finite, maybe infinite)

Otherwise we'll end up wasting lots of cycles on quadrature.

oschulz commented 2 years ago

Maybe this goes a bit for things "that are a density or a measure", though? This may be better situated in MeasureBase.

cscherrer commented 2 years ago

So things defined outside of MeasureBase will just be treated as completely unknown, and always use quadrature to get an approximate mass? That seems very limiting.

oschulz commented 2 years ago

So things defined outside of MeasureBase will just be treated as completely unknown

Well, there's isn't that much, right now: Distributions are normalized, and densities created for logfuncdensity don't have mass information in the first place, and code handling measures should use MeasureBase (at least).

Other densitiy types can have a mass, too, of course - but the best place to take care of that it is the the package that provides that density type, I would say.

Fully generic and automatic quadrature is hard to provide anyhow - there are many different algorithms and not all suit every situation. If the mass has to be estimated numerically, this won't work without involving the users in higher-dimensional cases, I would say.

cscherrer commented 2 years ago

Ok, so we need to be able to specify these things somewhere. It sounds like that will need to go into MeasureBase, and users of logfuncdensity will not have any way of specifying the mass. Is that right?

oschulz commented 2 years ago

I thought we can maybe add something like

mass(object) = missing
logmass(object) = missing

to DensityInterface (though the function name mass is probably too generic, should find a different name that can be exported), so users can specify a mass for a density if they know it (it would be the "integral in respect to an implicit base measure", of course :-) ).

The more detailed stuff I would propose to leave to MeasureBase.

cscherrer commented 2 years ago

How about massof and logmassof?

users can specify a mass for a density if they know it

I don't see how this would work. The type produced by logfuncdensity is owned by DensityInterface, right? So how could users add methods without type piracy?

oschulz commented 2 years ago

Yes, massof and logmassof could work.

users can specify a mass for a density if they know it I don't see how this would work. The type produced by logfuncdensity

No, indeed - logfuncdensity is a quick convenient way for users to turn a log-density function into a density object, nothing more. If users want to do something more complex, they should define a type and implement (at least) isdensity and logdensityof for it - and in this case, logmassof too, for example.

phipsgabler commented 2 years ago

Of course @cscherrer has already put thought into this before us :grinning:

I like (log)massof, it's consistent with the density functions, meaningful, and distinct from norm.

Of the five possibilities Chad mentioned, all except (3) are simple to handle: values of either missing or static(something). Essentially instances of a disjoint union between reals, infinity, and Missing.

(3), however, is an outlier -- a set-like restriction of the above ("any finite real number") and harder to represent by types. I can see why it's very useful, though. I suppose an additional type like Finite, corresponding to Infinity, could solve that, but is that really a good option?

I think I'd go with Static.jl and Infinities.jl, so static(∞)

You mean StaticValues.jl?

And what is logmassof supposed to be when massof is infinite? Error-throwing? Missing?

oschulz commented 2 years ago

And what is logmassof supposed to be when massof is infinite?

I would say infinite as well, since log(Inf) == Inf and exp(Inf) == Inf, so that would be well-defined.

phipsgabler commented 2 years ago

Oh, right... for whatever reason I have assumed that log(Inf) is undefined. So that's a non-issue, sorry.

oschulz commented 2 years ago

No worries. :-)

cscherrer commented 2 years ago

(3), however, is an outlier -- a set-like restriction of the above ("any finite real number") and harder to represent by types. I can see why it's very useful, though. I suppose an additional type like Finite, corresponding to Infinity, could solve that, but is that really a good option?

Good point, I hadn't thought of (3) as being so different.

You mean StaticValues.jl?

No, I mean https://github.com/SciML/Static.jl

The description of StaticValues says

experimental implementations for Static.jl

but I don't really understand the relationship between the two, especially given it doesn't list Static.jl as a dependency.

phipsgabler commented 2 years ago

Actually this is somewhat a variation of the hasdensity/isdensity design discussion in the other issue. We have a "cardinality" axis and a "value" axis, but only the following combinations make sense:

- Cardinality = unknown
  > Value = unknown (5)
- Cardinality = finite
  > Value = unknown (3)
  > Value = x (2)
- Cardinality = infinite
  > Value = Inf (4)

and (1) is just a special case of (2) where isone(massof(o)) holds in addition.

No, I mean SciML/Static.jl

Ah, weird. Github search didn't even show me that, but the readme is the same. Whatever.

cscherrer commented 2 years ago

Actually this is somewhat a variation of the hasdensity/isdensity design discussion in the other issue. We have a "cardinality" axis and a "value" axis, but only the following combinations make sense

:+1:

I had never heard of StaticValues. Not sure how they're related, so I opened https://github.com/JeffreySarnoff/StaticValues.jl/issues/1

phipsgabler commented 2 years ago

Looks like the repo only exists for a JuliaCon paper submission -- at least it's a fork of their template.

Anyway, continuing the actual discussion... we seem to agree that (log)massof is the better approach than a keyword for the density functions, and that the functionality actually belongs into this package. Is that correct?

If so, thinking more about the case split, a new sentinel singleton for (4) doesn't appear too bad to me:

- Value known
  > x or static(x) for finite x
  > static(∞)
- Value unknown
  > Finite()
  > missing/Unknown()

Calculations on Finite() should fail, since you cannot meaningfully propagate it; and if someone wants to continue symbolically, the value can be converted easily by dispatch to symbolic(:Z) or whatever.

I'm not so sure whether missing is the right value for unknown masses, since it will propagate to further calculations. For me, missing means "missing data" rather than "theoretically existing, but we can't calculate it analytically". Are there precedences of such usages? Or the alternative, an Unknown() that works like the Finite()?

Also, wrt. static: not all types of objects will be able to statically determine their normalization factors, right? Even something simple as Dirac(x) for arbitrary x, unless you also require static(x) for the parameter, too.

oschulz commented 2 years ago

Anyway, continuing the actual discussion... we seem to agree that (log)massof is the better approach than a keyword for the density functions, and that the functionality actually belongs into this package. Is that correct?

Hm, the question is: Do we support [log]massof for things that are a density (not have a density, like a measure)? A density, from the viewpoint of DensityInterface, is a functions that's represented by a (typically non-callable) object that's neutral regarding log/non-log. But it's still essentially a function.

I think @cscherrer would say that a (density) function doesn't have a mass, without specifying a base measure. But we can of course define that the mass of a density is the mass in respect (RD-integral) to an implicit base measure - just like we say in DensityInterface that the density of a measure/distribution is the density in respect (RN-derivative) to an implicit base measure. So it would just be the reverse, kinda.

1) If we do, then that means that we create kind of an implicit "equivalence" (probably not the right term) between densities and measures (based on implicit base measures), as fas as DensityInterface (not MeasureBase) is concerned.

I have no problem with that - I kinda like it, in fact. I think that's how many non-measure-theory-aware people think anyway (not differentiating between a distribution and it's PDF, for example). But this would also mean that we probably *do* want `logfuncdensity(logdensityof(distribution)) == distribution` - though `logfuncdensity` might not be the right function name.

2) If we don't, though - if we say densities can't have an implicit base measure, and therefore no mass - then the concept of [log]massof belongs into MeasureBase, I think.

I do like option 1 - people do integrate continuous (and sum up discrete) functions all the time, without even thinking about the base measure (Lebesgue resp. counting measure being the typical unconscious choice). "The integral of a function" is definitely something people say - and in the context of a use case, it's typically clear which integral they mean. But I'd like to hear @cscherrer's opinion.

cscherrer commented 2 years ago

Say you have a non-negative function f that integrates (with respect to Lebesgue measure) to a number m. If a user wants to consider f to be a probability density, it could be

  1. The density is f, but the base measure is not actually Lebesgue, but Lebesgue weighted by 1/m
  2. The density is not f, but f weighted by 1/m, and the base measure is Lebesgue

(2) seems simpler for beginners to understand, but still has the potential confusion that the density is not the thing they just provided as a density.

So, I really don't know.

oschulz commented 2 years ago

Maybe let's ask a different question, then: @phipsgabler , which use cases would we have in which users would define a density and a mass for it, but would not want it to become a measure?

And on the other side: @cscherrer, does MeasureBase offer (or could offer) a function like measure_from(d, base, logmass::Union{Missing,FiniteMass,Real}) - not exactly like that, maybe, but conceptually. d would be anything with DensityInterface.isdensity(d) == true that supports DensityInterface.[log]densityof. logmass would be either missing, FiniteMass or a real number, including static(0) and static(Inf).

cscherrer commented 2 years ago

The known case here is easy. For others... I think some computations like this could benefit from a "thunk" abstraction where some values are unknown and can be "set" as they're discovered. Very OOP-like, which I'm usually opposed to but could fit this use case.

cscherrer commented 2 years ago

Oh, but maybe that's just https://github.com/JuliaCollections/Memoize.jl

phipsgabler commented 2 years ago
2. The density is not `f`, but `f` weighted by `1/m`, and the base measure is Lebesgue

My interpretation for "mass normalization" in the layperson perspective is pretty much this: give me a new density f / m with respect to the same implicit base measure (usually Lebesgue).

[This IMHO is much more physically intuitive: starting from a mass density (kg/m^3 integrated against m^3), you would go via normalization to a "volume ratio" density (1/m^3 against m^3). Not the same mass density but integrated over "mass-ratioed space" (kg/m^3 against m^3/kg).]

As for examples: posteriors expressed as probabilistic programs will (excluding improper priors and factors) fall in the case (4): objects having a density, and unknown finite mass. And for a sufficiently complex PPL, you won't be able to figure out the base measure, either because of stochastic control flow, or just due to implementation reasons (both apply to Turing.jl).

But anyway, haven't we already commited to the notion that densities always have a base measure in theory, but that it may be left implicit all the time?

cscherrer commented 2 years ago

But anyway, haven't we already commited to the notion that densities always have a base measure in theory, but that it may be left implicit all the time?

I think the issue is that it's an overdetermined system. Once you start talking about mass, that's integration. Most users are implicitly thinking Lebesgue measure, which is fine. But it doesn't make sense to have

That constant has to go somewhere.

phipsgabler commented 2 years ago

Oh, I'm using "probability measure" as "the mass is known to be 1", and would only use it as a contingent notion. Are you referring to usages such as "unnormalized probability measure"? Of course, if you understand it that way, the four properties are overdetermined.

cscherrer commented 2 years ago

Right, things get a lot easier if f is known to integrate to one. I thought this discussion was mostly about other cases?

phipsgabler commented 2 years ago

Yes, but then I don't get your point about the "overdetermined system".

We can have q(x) = exp(-x^2), "implicitely with respect to Lebesgue", and pretend not knowing the integration constant. Then the real mass Z is to be determined.

Or we have p(x) = (1/Z) * exp(-x^2), "implicitely with respect to Lebesgue", and massof(p) = 1.

Are you talking about the interpretation of "q is a probability measure over the reals and we don't know Z"? Then it might indeed look like a conflict between

But I don't see such a conflict arising. If an density f is a measure, then the mass Z is explicitly defined through the base measure mu. If not, then the base measure of f is assumed to be an implicit but fixed mu. There is then also a mass Z with respect to mu. In both cases, we might construct a new density, f / Z, which with respect to mu has mass 1. We just can't talk about both Z and 1 separately.

If that's not what you're talking about, please give me a counterexample :D

phipsgabler commented 2 years ago

I suspect (but am not sure) that our misunderstanding arises because I for now think about the density objects first, with the base measure just being additional information telling us about "support" (really just "PMF or PDF?"), while you are always starting from measures first and most importantly, and the RD-derivative resulting as a consequent therefrom. But that is because I try to put myself into the head of the possible measure-theoretically unaware end user.

oschulz commented 2 years ago

Oh, but maybe that's just https://github.com/JuliaCollections/Memoize.jl

Yes, cached results could be useful to handle "properties" of things that can be expensive to calculate while still keeping things immutable (semantically).

oschulz commented 2 years ago

As for examples: posteriors

Ah, but posteriors are measures, not densities. So they have a mass that doesn't depend on the choice of a base measre. So they should implement the MeasureBase interface, like Distributions hopefully will, soon. Correct, @cscherrer ?

cscherrer commented 2 years ago

Thanks @phipsgabler . I agree with most of this, but a couple of pieces are very confusing.

If an density f is a measure

Those are very different things

then the mass Z is explicitly defined through the base measure mu.

No, it's easy to define a measure without knowing what it integrates to.

If not, then the base measure of f is assumed to be an implicit but fixed mu. There is then also a mass Z with respect to mu. In both cases, we might construct a new density, f / Z, which with respect to mu has mass 1. We just can't talk about both Z and 1 separately.

This sounds right to me.

Let's step back for a second. If a user gives a density fit's easy to imagine creating a new density, say normalized(f). Maybe that's the idea? If this value happens to be known (user-provided or computed using quadrature) then normalized(f) can actually be computed. Otherwise it would need to be a "known up to a constant" kind of thing.

oschulz commented 2 years ago

If a user gives a density fit's easy to imagine creating a new density, say normalized(f)

But if I can do that, I've kind of turned my density into a measure already, right? After all, to normalize it I have to "measure" it.

oschulz commented 2 years ago

@phipsgabler I guess the question is, are there use cases where we would want to do something like integrate a likelihood (without the prior).