Open phipsgabler opened 2 years ago
Hm, I wonder if something like this could go into Distributions?
The real use case of this IMHO is the query "is something a probability density". But that should then be available to PPLs too, since there we have all kinds of densities that we know are not normalized. So if this is deemed a good idea, I'd rather have it here.
The real use case of this IMHO is the query "is something a probability density".
What do you think about my proposal at the (current) end of #5 regarding adding ismeasure
and isdensity
, as more specific queries than hasdensity
? Would that serve this use case as well?
I wouldn't think that makes a difference; being a measure doesn't tell you anything about normalization (or being a probability measure).
A third approach, where you only have to define normalizingconstant
:
hasnormalizingconstant(d) = normalizingconstant(d) isa Nothing
normalizingconstant(d) = nothing
normalizingconstant(d::SomeDensity) = ...
isnormalized(d) = normalizingconstant(d) == 1
We could call it logdensitynorm
, and default it to logdensitynorm(d) == missing
. Not sure we'd need a haslogdensitynorm
, with a default of missing
.
Having a logdensitynorm
could be useful in applications that need to operate in non-log space (e.g. subvolume integration) but need to renormalize (shift log-space) before doing exp(logdensityof(...))
because they would be out of the dynamic range of their float-type otherwise.
I think I like that idea. It seemed to me that it would be nice also be able to express "infinite norms" (as there is a semantic difference between "finite but unknown constant", as in posteriors, and "known to be non-normalizable"). With your approach, it would work out, right?
logdensitynorm(d) = missing
logdensitynorm(d::Distribution) = 1
densitynorm(d) = exp(logdensitynorm(d))
will propagate the missing
, as do subsequent computations. And then we can also have
densitynorm(d::Lebesgue) = Inf
# default: logdensitynorm(d::Lebesgue) = missing
with somewhat consistent behaviour.
So it seems nice on the first glance, but I'm not confident enough to judge downstream effects.
Yes, I think so. But maybe the name logdensitynorm
would be weird when used on measures/distributions - their norm is independent from which density is derived for them using different base measures. May we should just use LinearAlgebra.norm
, and add lognorm
on our own?
Then, for Bayesian inference, we'd have lognorm(∫exp(log_likelihood, prior)) == log_evidence
.
Hm, maybe using LinearAlgebra.norm
is a bad idea, though, it's not specific enough. One might expect that norm(f_density) == sqrt(∫f_density(x)^2 dx)
, or something else. There are so many norms ...
Yeah, I really wouldn't use LinearAlgebra.norm
; for distributions, it's also already taken (https://github.com/JuliaStats/Distributions.jl/pull/1321). And for that reason, if something like this is implemented, we should probably refrain from "norm" as the name.
What could we call it so that it would also make sense for densities (as the integral of the density function over it's support)?
What is the norm of MeasureTheory.Dirichlet([2,3])
, and can I compute it with `MeasureTheory?
"Norm" and "normalize" are very overloaded, which is why we went with mass
and logmass
cc @sethaxen
I like those. I think defining those as abstract function in DensityInterface could be very handy. We often do use the term "norm of a density", we say the "PDF is normalized" and so on. So it's not just useful for MeasureBase, it's also useful for densities (density functions). We do of course mean the integral in respect to an implied base measure - but that's the same as using an implied base measure in `[log]densityof], so it would fit in quite neatly.
What do you think about this, @devmotion ?
Here's some more context: https://github.com/cscherrer/MeasureTheory.jl/issues/128
For a trait-based approach, I can see a few situations we'd want to distinguish:
Do we have a good package for static numbers that can represent infinity? Then logmass(measure)
could deliver all of this - return either static zero (we could take that from Zeros.jl), a real number, FiniteMass()
(not sure we need that one), a static Inf or missing
.
All of those results could be dispatched upon, giving you that trait-based approach.
I think I'd go with Static.jl and Infinities.jl, so static(∞)
Nice!
We wouldn't need these as dependencies for DensityInterface, even - we would just define
function logmass(object)
if densitykind(object) <: IsOrHasDensity
missing
else
throw(ArgumentArror("object is not and does not have a density")
end
end
Implementations of densities and measures can then return a real number or a static something as fits the case.
When the user gives a function and says "treat this as a density", they need to be able to specify which of these cases we're in:
1. Measures that are normalized 2. Measures with a known mass, which may or may not be one 3. Measures known to be finite, with unknown mass 4. Measures known to have infinite mass 5. Measures where the mass is unknown (maybe finite, maybe infinite)
Otherwise we'll end up wasting lots of cycles on quadrature.
Maybe this goes a bit for things "that are a density or a measure", though? This may be better situated in MeasureBase.
So things defined outside of MeasureBase will just be treated as completely unknown, and always use quadrature to get an approximate mass? That seems very limiting.
So things defined outside of MeasureBase will just be treated as completely unknown
Well, there's isn't that much, right now: Distribution
s are normalized, and densities created for logfuncdensity
don't have mass information in the first place, and code handling measures should use MeasureBase (at least).
Other densitiy types can have a mass, too, of course - but the best place to take care of that it is the the package that provides that density type, I would say.
Fully generic and automatic quadrature is hard to provide anyhow - there are many different algorithms and not all suit every situation. If the mass has to be estimated numerically, this won't work without involving the users in higher-dimensional cases, I would say.
Ok, so we need to be able to specify these things somewhere. It sounds like that will need to go into MeasureBase, and users of logfuncdensity
will not have any way of specifying the mass. Is that right?
I thought we can maybe add something like
mass(object) = missing
logmass(object) = missing
to DensityInterface (though the function name mass
is probably too generic, should find a different name that can be exported), so users can specify a mass for a density if they know it (it would be the "integral in respect to an implicit base measure", of course :-) ).
The more detailed stuff I would propose to leave to MeasureBase.
How about massof
and logmassof
?
users can specify a mass for a density if they know it
I don't see how this would work. The type produced by logfuncdensity
is owned by DensityInterface, right? So how could users add methods without type piracy?
Yes, massof
and logmassof
could work.
users can specify a mass for a density if they know it I don't see how this would work. The type produced by logfuncdensity
No, indeed - logfuncdensity
is a quick convenient way for users to turn a log-density function into a density object, nothing more. If users want to do something more complex, they should define a type and implement (at least) isdensity
and logdensityof
for it - and in this case, logmassof
too, for example.
Of course @cscherrer has already put thought into this before us :grinning:
I like (log)massof
, it's consistent with the density functions, meaningful, and distinct from norm
.
Of the five possibilities Chad mentioned, all except (3) are simple to handle: values of either missing
or static(something)
. Essentially instances of a disjoint union between reals, infinity, and Missing
.
(3), however, is an outlier -- a set-like restriction of the above ("any finite real number") and harder to represent by types. I can see why it's very useful, though. I suppose an additional type like Finite
, corresponding to Infinity
, could solve that, but is that really a good option?
I think I'd go with Static.jl and Infinities.jl, so
static(∞)
You mean StaticValues.jl?
And what is logmassof
supposed to be when massof
is infinite? Error-throwing? Missing?
And what is logmassof supposed to be when massof is infinite?
I would say infinite as well, since log(Inf) == Inf
and exp(Inf) == Inf
, so that would be well-defined.
Oh, right... for whatever reason I have assumed that log(Inf)
is undefined. So that's a non-issue, sorry.
No worries. :-)
(3), however, is an outlier -- a set-like restriction of the above ("any finite real number") and harder to represent by types. I can see why it's very useful, though. I suppose an additional type like Finite, corresponding to Infinity, could solve that, but is that really a good option?
Good point, I hadn't thought of (3) as being so different.
You mean StaticValues.jl?
No, I mean https://github.com/SciML/Static.jl
The description of StaticValues says
experimental implementations for Static.jl
but I don't really understand the relationship between the two, especially given it doesn't list Static.jl as a dependency.
Actually this is somewhat a variation of the hasdensity
/isdensity
design discussion in the other issue. We have a "cardinality" axis and a "value" axis, but only the following combinations make sense:
- Cardinality = unknown
> Value = unknown (5)
- Cardinality = finite
> Value = unknown (3)
> Value = x (2)
- Cardinality = infinite
> Value = Inf (4)
and (1) is just a special case of (2) where isone(massof(o))
holds in addition.
No, I mean SciML/Static.jl
Ah, weird. Github search didn't even show me that, but the readme is the same. Whatever.
Actually this is somewhat a variation of the hasdensity/isdensity design discussion in the other issue. We have a "cardinality" axis and a "value" axis, but only the following combinations make sense
:+1:
I had never heard of StaticValues. Not sure how they're related, so I opened https://github.com/JeffreySarnoff/StaticValues.jl/issues/1
Looks like the repo only exists for a JuliaCon paper submission -- at least it's a fork of their template.
Anyway, continuing the actual discussion... we seem to agree that (log)massof
is the better approach than a keyword for the density functions, and that the functionality actually belongs into this package. Is that correct?
If so, thinking more about the case split, a new sentinel singleton for (4) doesn't appear too bad to me:
- Value known
> x or static(x) for finite x
> static(∞)
- Value unknown
> Finite()
> missing/Unknown()
Calculations on Finite()
should fail, since you cannot meaningfully propagate it; and if someone wants to continue symbolically, the value can be converted easily by dispatch to symbolic(:Z)
or whatever.
I'm not so sure whether missing
is the right value for unknown masses, since it will propagate to further calculations. For me, missing
means "missing data" rather than "theoretically existing, but we can't calculate it analytically". Are there precedences of such usages? Or the alternative, an Unknown()
that works like the Finite()
?
Also, wrt. static: not all types of objects will be able to statically determine their normalization factors, right? Even something simple as Dirac(x)
for arbitrary x
, unless you also require static(x)
for the parameter, too.
Anyway, continuing the actual discussion... we seem to agree that (log)massof is the better approach than a keyword for the density functions, and that the functionality actually belongs into this package. Is that correct?
Hm, the question is: Do we support [log]massof
for things that are a density (not have a density, like a measure)? A density, from the viewpoint of DensityInterface, is a functions that's represented by a (typically non-callable) object that's neutral regarding log/non-log. But it's still essentially a function.
I think @cscherrer would say that a (density) function doesn't have a mass, without specifying a base measure. But we can of course define that the mass of a density is the mass in respect (RD-integral) to an implicit base measure - just like we say in DensityInterface that the density of a measure/distribution is the density in respect (RN-derivative) to an implicit base measure. So it would just be the reverse, kinda.
1) If we do, then that means that we create kind of an implicit "equivalence" (probably not the right term) between densities and measures (based on implicit base measures), as fas as DensityInterface (not MeasureBase) is concerned.
I have no problem with that - I kinda like it, in fact. I think that's how many non-measure-theory-aware people think anyway (not differentiating between a distribution and it's PDF, for example). But this would also mean that we probably *do* want `logfuncdensity(logdensityof(distribution)) == distribution` - though `logfuncdensity` might not be the right function name.
2) If we don't, though - if we say densities can't have an implicit base measure, and therefore no mass - then the concept of [log]massof
belongs into MeasureBase, I think.
I do like option 1 - people do integrate continuous (and sum up discrete) functions all the time, without even thinking about the base measure (Lebesgue resp. counting measure being the typical unconscious choice). "The integral of a function" is definitely something people say - and in the context of a use case, it's typically clear which integral they mean. But I'd like to hear @cscherrer's opinion.
Say you have a non-negative function f
that integrates (with respect to Lebesgue measure) to a number m
. If a user wants to consider f
to be a probability density, it could be
f
, but the base measure is not actually Lebesgue, but Lebesgue weighted by 1/m
f
, but f
weighted by 1/m
, and the base measure is Lebesgue(2) seems simpler for beginners to understand, but still has the potential confusion that the density is not the thing they just provided as a density.
So, I really don't know.
Maybe let's ask a different question, then: @phipsgabler , which use cases would we have in which users would define a density and a mass for it, but would not want it to become a measure?
And on the other side: @cscherrer, does MeasureBase offer (or could offer) a function like measure_from(d, base, logmass::Union{Missing,FiniteMass,Real})
- not exactly like that, maybe, but conceptually. d
would be anything with DensityInterface.isdensity(d) == true
that supports DensityInterface.[log]densityof
. logmass
would be either missing
, FiniteMass
or a real number, including static(0)
and static(Inf)
.
The known case here is easy. For others... I think some computations like this could benefit from a "thunk" abstraction where some values are unknown and can be "set" as they're discovered. Very OOP-like, which I'm usually opposed to but could fit this use case.
Oh, but maybe that's just https://github.com/JuliaCollections/Memoize.jl
2. The density is not `f`, but `f` weighted by `1/m`, and the base measure is Lebesgue
My interpretation for "mass normalization" in the layperson perspective is pretty much this: give me a new density f / m
with respect to the same implicit base measure (usually Lebesgue).
[This IMHO is much more physically intuitive: starting from a mass density (kg/m^3 integrated against m^3), you would go via normalization to a "volume ratio" density (1/m^3 against m^3). Not the same mass density but integrated over "mass-ratioed space" (kg/m^3 against m^3/kg).]
As for examples: posteriors expressed as probabilistic programs will (excluding improper priors and factors) fall in the case (4): objects having a density, and unknown finite mass. And for a sufficiently complex PPL, you won't be able to figure out the base measure, either because of stochastic control flow, or just due to implementation reasons (both apply to Turing.jl).
But anyway, haven't we already commited to the notion that densities always have a base measure in theory, but that it may be left implicit all the time?
But anyway, haven't we already commited to the notion that densities always have a base measure in theory, but that it may be left implicit all the time?
I think the issue is that it's an overdetermined system. Once you start talking about mass, that's integration. Most users are implicitly thinking Lebesgue measure, which is fine. But it doesn't make sense to have
f
is the densityThat constant has to go somewhere.
Oh, I'm using "probability measure" as "the mass is known to be 1", and would only use it as a contingent notion. Are you referring to usages such as "unnormalized probability measure"? Of course, if you understand it that way, the four properties are overdetermined.
Right, things get a lot easier if f
is known to integrate to one. I thought this discussion was mostly about other cases?
Yes, but then I don't get your point about the "overdetermined system".
We can have q(x) = exp(-x^2)
, "implicitely with respect to Lebesgue", and pretend not knowing the integration constant. Then the real mass Z
is to be determined.
Or we have p(x) = (1/Z) * exp(-x^2)
, "implicitely with respect to Lebesgue", and massof(p) = 1
.
Are you talking about the interpretation of "q
is a probability measure over the reals and we don't know Z
"? Then it might indeed look like a conflict between
q
against Lebesgue()
, with mass Z
orq
against (1/Z) * Lebesgue()
, with mass 1.But I don't see such a conflict arising. If an density f
is a measure, then the mass Z
is explicitly defined through the base measure mu
. If not, then the base measure of f
is assumed to be an implicit but fixed mu
. There is then also a mass Z
with respect to mu
. In both cases, we might construct a new density, f / Z
, which with respect to mu
has mass 1. We just can't talk about both Z
and 1 separately.
If that's not what you're talking about, please give me a counterexample :D
I suspect (but am not sure) that our misunderstanding arises because I for now think about the density objects first, with the base measure just being additional information telling us about "support" (really just "PMF or PDF?"), while you are always starting from measures first and most importantly, and the RD-derivative resulting as a consequent therefrom. But that is because I try to put myself into the head of the possible measure-theoretically unaware end user.
Oh, but maybe that's just https://github.com/JuliaCollections/Memoize.jl
Yes, cached results could be useful to handle "properties" of things that can be expensive to calculate while still keeping things immutable (semantically).
As for examples: posteriors
Ah, but posteriors are measures, not densities. So they have a mass that doesn't depend on the choice of a base measre. So they should implement the MeasureBase interface, like Distributions hopefully will, soon. Correct, @cscherrer ?
Thanks @phipsgabler . I agree with most of this, but a couple of pieces are very confusing.
If an density
f
is a measure
Those are very different things
then the mass
Z
is explicitly defined through the base measuremu
.
No, it's easy to define a measure without knowing what it integrates to.
If not, then the base measure of
f
is assumed to be an implicit but fixedmu
. There is then also a massZ
with respect tomu
. In both cases, we might construct a new density,f / Z
, which with respect tomu
has mass 1. We just can't talk about bothZ
and 1 separately.
This sounds right to me.
Let's step back for a second. If a user gives a density f
it's easy to imagine creating a new density, say normalized(f)
. Maybe that's the idea? If this value happens to be known (user-provided or computed using quadrature) then normalized(f)
can actually be computed. Otherwise it would need to be a "known up to a constant" kind of thing.
If a user gives a density fit's easy to imagine creating a new density, say normalized(f)
But if I can do that, I've kind of turned my density into a measure already, right? After all, to normalize it I have to "measure" it.
@phipsgabler I guess the question is, are there use cases where we would want to do something like integrate a likelihood (without the prior).
I've been previously thinking about whether and how a density interface should include the possibility to require or check for normalization, so I'm just posing this for discussion here.
Options that come to my mind:
Or maybe include a whole machinery for dealing with normalization constants?