Closed cscherrer closed 3 years ago
One option would be to merge the functions into a single logdensity
function that could be defined in an otherwise empty LogDensity
package.
To avoid that users have to qualify names, a simpler solution would be though to only export one of the two functions in Soss as logdensity
, and at most an alias for the other one. The choice would depend on what function is more likely to be used.
Maybe we can remove all exports from AbstactPPL since it’s an infrastructure package?
Maybe we should. I believe it's not completely relevant for the problem here though since I assume that users would not load AbstractPPL.
To me it seems the main question is how eg. Soss could give users access to both AbstractPPL.logdensity
and MeasureBase.logdensity
at the same time. Internally, in Soss one could use aliases or qualify all occurrences but it is not possible to export both functions. Additionally, if Soss would export AbstractPPL.logdensity
and users load Soss + MeasureBase/MeasureTheory, then they would have to qualify all uses of logdensity
.
Maybe at the call we should talk about the semantics of these functions. MeasureBase.logdensity
is a log-density in the mathematical sense. It evaluates the Radon-Nikodym derivative at a point, with the base measure defaulting to the basemeasure
of whatever measure you're working with. It's called logdensity
to avoid name collision with logpdf
, and because the objects we evaluate it on aren't necessarily distributions, but any measure. I've also defined basemeasure
for Distribution
s, so logdensity
defaults to logpdf
on those.
My motivations for choosing the name logdensity
were
loglikelihood
, logprior
, etc., which are all expressible in terms of logdensity
of differently conditioned models.That agrees pretty much with @cscherrer's intentions; besides the fact that we don't use measure terminology anywhere, the measure theoretic usage is what I actually meant. So ideally, they are the same, and our usage should be compatible with logdensity
of MeasureTheory.jl.
(There's the problematic aspect that all this is done without ever specifying wrt. what base measure the density is meant to be taken -- but that problem goes beyond this discussion.)
I'd be happy for us all (APPL, MT, and LogDensityProblems) to use a shared function, but didn't want inflict any changes on anyone. LogDensityProblems seemed like the place where this should live, but I didn't want to depend on the additional AD and transformation stuff there. The problem we have to solve is on of package infrastructure design, where I am too inexperienced.
Thanks @phipsgabler . I mostly agree with this, but some nit-picks might hopefully help in making the discussion more precise...
1. to avoid the IMHO artificial distiction between "PDFs" and "PMFs",
I agree this is artificial. Distributions uses PDF for everything; this is formally correct, because a discrete distribution has a density over counting measure.
2. to use an appropriate term for densities which are not probability densities, like (unnormalized) posteriors and likelihoods, and
Need to be a little careful here. A likelihood is only a density if the parameter space has an implicit base measure. Generally, it's just a function. In MeasureTheory, we work with likelihoods as things that act on measures, so there's an operation (pointwise product, ⊙
) taking a measure and a likelihood, and returning a measure.
(There's the problematic aspect that all this is done without ever specifying wrt. what base measure the density is meant to be taken -- but that problem goes beyond this discussion.)
Right. Fortunately, at least in cases I know of, the base measure for Distribution
s can be determined by the type, up to some details like size
for dynamically-shaped arrays. For more complex situations, things aren't really expressible without MeasureTheory anyway :)
I'd be happy for us all (APPL, MT, and LogDensityProblems) to use a shared function, but didn't want inflict any changes on anyone.
Thanks, I agree
LogDensityProblems seemed like the place where this should live, but I didn't want to depend on the additional AD and transformation stuff there.
I don't think it should live there. IMO the emphasis there is on Problems
- it's mostly a very opinionated API for expressing problems, more so than for more generic computations on log-density functions. Very useful for its purpose, but too narrow in scope to use as a basis for an entire ecosystem.
The problem we have to solve is on of package infrastructure design, where I am too inexperienced.
And too modest ;)
In case it's helpful, I had a problem with how to work with Distributions.logpdf
. I wanted
logpdf
in MeasureBase
MeasureBase
depend on Distributions
I think this worked out pretty well... In MeasureTheory, I have
import MeasureBase
import Distributions: logpdf, pdf
export pdf, logpdf
Distributions.logpdf(d::AbstractMeasure, x) = MeasureBase.logpdf(d, x)
Distributions.pdf(d::AbstractMeasure, x) = exp(Dists.logpdf(d, x))
The problem we have to solve is on of package infrastructure design, where I am too inexperienced.
IMO the cleanest and most lightweight solution if these different logdensity
functions should be unified would be to define a logdensity
stub in a separate package without any dependencies, probably without any concrete definition (maybe something like density(args..) = exp(logdensity(args...))
if it seems useful). This a common approach and would e.g. follow the example of other packages such as CommonSolve, InverseFunctions, RealDot, StatsAPI etc.
One option would be to merge the functions into a single
logdensity
function that could be defined in an otherwise emptyLogDensity
package.
1. to avoid the IMHO artificial distiction between "PDFs" and "PMFs",
I agree this is artificial. Distributions uses PDF for everything; this is formally correct, because a discrete distribution has a density over counting measure.
Oh, absolutely, it is formally correct; but I was thinking of the (apprearently not so small) number of people who seem to insist on PMFs being a wholly different concept from "continuous" "densities", and not "just another Radon-Nikodym derivative wrt. a different base measure". In other words, "PDF" can be a loaded term, and using "density" consistently circumvents that without becoming unclear for anyone or inventing new words.
A stub could make sense as long as we can agree on a common API. I guess my biggest concern here is making the three-argument form available to Distributions.
I don't think one should enforce three arguments as both Distributions (as logpdf
) and AbstractPPL (as logdensity
) only use two arguments. Maybe the two-argument version should the default in fact, with logdensity(measure, x)
as the general API. This would match both the use in Distributions and AbstractPPL - in the former case the measure
is specified as a Distribution
, in the latter by a probabilistic model
. And evaluation is done at a point x
in the support in Distributions and a trace
of the probabilistic model in AbstractPPL.
"Three argument form" is with the base measure being explicit, right?
So we'd have those variants:
logdensity(μ, x)
: μ
describes the measure, the base measure is implied. Examples:
μ
is a Distributions distribution, x
within its support; μ
is an AbstractMeasure
(or duck-typed measure), x
within its support; μ
is a PPL model, x
a trace-like data structurelogdensity(μ, ν, x)
: μ
and ν
are AbstractMeasure
s (or duck-typed measure), x
within their support.I think it would mostly suffice to specify the API as in (1), with a loose relation of "measury part" and "value part" for whatever you are dealing with, and the idea that the base measure in these cases is implied. The latter can be made explicit by the form (2) in case an implementation supports that.
That's mostly correct @phipsgabler. There's no requirement for x
to be in the support. If it's outside, logdensity
will just return -Inf
. This is important, because otherwise you end up throwing errors and trying to catch them, which is messy and has lots of overhead. And for example, it wouldn't make sense for logdensity(Dirac(x), y)
to error outside of x == y
.
Measures are defined using the two-argument form. In this case the base measure is "implied" in the sense that it's taken to be given by basemeasure
. It turns out to be much more efficient to have the three-argument form defined in terms of the two-argument form.
Disallowing the three-argument form for Distributions will make many computations awkward, and make Distributions mostly unusable with MeasureTheory. I don't think that's really a viable option.
Good point about the support. Then part of the contract should then be something like
logdensity
should be well defined over the extension ofμ
's support to the whole range of the type of the compatiblex
value, returning an equivalent of-Inf
outside the support (for meaningful values, ignoring 'undefined' cases likeNaN
)".
I don't get the requirement for Distributions, though. Do you imagine them to adapt the logdensity
inteface? Or is that only referring to the glue code? Where will that glue code live, and how will it be defined? Just basemeasure
dispatching on ValueSupport
and VariateForm
?
Then part of the contract should then be something like...
Yes that's right :)
The Distributions thing is about being able to use Distributions as measures. You can take logdensity
to use Distributions.logpdf
, and basemeasure
is pretty easy to define based on the types. We want to be able to use Distributions with the combinators from MeasureTheory. That requires the three-argument form, which we define generically in terms of logdensity
and basemeasure
.
Maybe the simplest solution is if MeasureBase
"owns" the three-argument form, providing a default implementation. This could be overridden in other packages, but would be expected to follow the same semantics.
MeasureBase could define it only for the types it owns, otherwise it would be type piracy.
I think you mean "should"? Avoiding type piracy is generally a good idea, though there are certainly exceptions.
Currently it's not type piracy, because MeasureBase owns logdensity
. I'm trying to find ways to share the name without breaking things. Do you have other suggestions?
I went ahead and just set up this: https://github.com/phipsgabler/LogDensity.jl.
The piracy issue is a problem indeed -- where should glue code go...? Is there a way to solve that with Requires?
No, I really think we should try to avoid type piracy as much as possible. In my experience it just causes too many problems in the long run.
I also think Requires should be avoided. It's a hack that causes compilation issues and increases loading times. IMO an interface package with Requires is not lightweight anymore.
IMO the correct solution here is to define the following in LogDensity:
function basemeasure(x) end
function logdensity(measure, x, basemeasure) end
logdensity(measure, x) = logdensity(measure, x, basemeasure(measure))
IMO the correct solution here is to define the following in LogDensity:
function basemeasure(x) end function logdensity(measure, x, basemeasure) end logdensity(measure, x) = logdensity(measure, x, basemeasure(measure))
Probably we should define
function logdensity(measure, x, basemeasure)
logdensity(measure, x) + logdensity(LogDensity.basemeasure(basemessure), x, basemeasure)
end
Then one could just define the two-argument version and basemeasure
.
This is backwards. Most measures are defined in terms of the two-argument form. The three-argument form is defined using that and basemeasure
. We'll almost certainly refine the implementation of this, but currently it's
@inline function logdensity(μ::AbstractMeasure, ν::AbstractMeasure, x)
α = basemeasure(μ)
β = basemeasure(ν)
# If α===μ and β===ν, The recursive call would be exactly the same as the
# original one. We need to break the recursion.
if α===μ && β===ν
@warn """
No method found for logdensity(μ, ν, x) where
typeof(μ) == $(typeof(μ))
typeof(ν) == $(typeof(ν))
Returning NaN. If this is incorrect, please add a method
logdensity(μ::$(typeof(μ)), ν::$(typeof(ν)), x)
"""
return NaN
end
# Infinite or NaN results occur when outside the support of α or β,
# and also when one measure is singular wrt the other. Computing the base
# measures first is often much cheaper, and allows the numerically-intensive
# computation to "fall through" in these cases.
# TODO: Add tests to check that NaN cases work properly
ℓ = logdensity(α, β, x)
isfinite(ℓ) || return ℓ
ℓ += logdensity(μ, x)
ℓ -= logdensity(ν, x)
return ℓ
end
There are more details on this in the paper, https://github.com/cscherrer/MeasureTheory.jl/blob/paper/paper/paper.pdf
This is backwards. Most measures are defined in terms of the two-argument form. The three-argument form is defined using that and basemeasure.
Isn't this what I suggested in my second comment?
To add first-class support for measures with implicit but undefined basemeasure
one could define (putting everything together):
basemeasure(x) = nothing
logdensity(measure, x) = logdensity(measure, x, basemeasure(measure)
function logdensity(measure, x, basemeasure)
return logdensity(measure, x) + logdensity(LogDensity.basemeasure(measure), x, basemeasure)
end
logdensity(_, _, ::Nothing) = error("logdensity is not defined for unspecified base measures")
logdensity(::Nothing, _, _) = error("logdensity is not defined for unspecified measures)
# resolve ambiguity error
logdensity(::Nothing, _, ::Nothing) = error("logdensity is not defined for unspecified measures and base measures")
density(measure, x) = exp(logdensity(measure, x))
density(measure, x, basemeasure) = exp(logdensity(measure, x, basemeasure))
Then AbstractPPL could just define logdensity(model, trace)
and everything would work (and error) as expected. Distributions could define logdensity(dist, x)
in the same way but probably more useful would be to define basemeasure(::ContinuousDistribution) = Lebesgue()
and basemeasure(::DiscreteDistribution) = CountingMeasure()
(which would have to be defined in LogDensity) such that different base measures could be used if desired. And MeasureTheory could just implement whatever versions and specializations it wants to define.
BTW an alternative approach to (log) densities focused on the two-argument version is https://github.com/oschulz/DensityInterface.jl
Moving the parameter around to match how we use them, this is
basemeasure(x) = nothing
logdensity(measure, x) = logdensity(measure, basemeasure(measure), x)
function logdensity(measure, base, x)
return logdensity(measure, x) + logdensity(LogDensity.basemeasure(measure), base, x)
end
logdensity(_, ::Nothing, _) = error("logdensity is not defined for unspecified base measures")
logdensity(::Nothing, _, _) = error("logdensity is not defined for unspecified measures")
# resolve ambiguity error
logdensity(::Nothing, ::Nothing, _) = error("logdensity is not defined for unspecified measures and base measures")
density(measure, x) = exp(logdensity(measure, x))
density(measure, base, x) = exp(logdensity(measure, base, x))
I don't think this logdensity(measure, base, x)
method will work as expected, you'll basically need what I have in MeasureBase.
It sounds like you're suggesting all of this, and also Lebesgue
, should be defined in LogDensity
. I guess you'd put CountingMeasure
in there too, to cover discrete distributions. And it would have to depend on Distributions in order to have ::ContinuousDistribution
methods. So the result would basically be MeasureBase
, just with a strong preference toward Distributions instead of Measures. I don't see how this helps.
No, I don't suggest a dependency on Distributions. It should be an interface.AbstractPPL, Distributions etc could implement it in their package if they think it is useful. And the design
should allow to just implement the two-argument version (possible with the suggestion). If one uses one of the simple standard measures Lebesgue
and CountingMeasure
as base measure, other packages such as MeasureTheory could perform computations with respect to other base measures as well - they just have to define density of these standard measure with respect to their custom measures (would allow to move the logdensity
integration for Distributions in MeasureTheory/Base to Distributions). And MeasureTheory could define other measures, logdensity implementations etc.
No, I don't suggest a dependency on Distributions. It should be an interface.
Ok, this makes more sense. Thanks for clarifying.
Distributions etc could implement it in their package if they think it is useful.
This could be tricky, because it would take functionality away from MT until Distributions adopts the changes. Part of my motivation in moving away from Distributions was the difficulty in getting changes through, and I could see this being a big time sink.
So I'd probably need to keep things in MT until a Distributions release with the update.
If one uses one of the simple standard measures
Lebesgue
andCountingMeasure
as base measure, other packages such as MeasureTheory could perform computations with respect to other base measures as well - they just have to define density of these standard measure with respect to their custom measures
That often doesn't work. For example, say you have a finite measure on a curve in ℝ². This is singular wrt Lebesgue measure on ℝ², and counting measure is singular wrt it. Things like this can be expressed using Bijectors, and to get the measure right you usually need to work with a pushforward of Lebesgue measure.
Part of my motivation in moving away from Distributions was the difficulty in getting changes through, and I could see this being a big time sink.
I try to merge and release without too much delay, last month 12 PRs were merged and I released 5 new versions: https://github.com/JuliaStats/Distributions.jl/pulse/monthly
That often doesn't work. For example, say you have a finite measure on a curve in ℝ². This is singular wrt Lebesgue measure on ℝ², and counting measure is singular wrt it. Things like this can be expressed using Bijectors, and to get the measure right you usually need to work with a pushforward of Lebesgue measure.
Sure, I know that it is does not cover all cases :slightly_smiling_face: My main point was just that the base measure heuristics that currently are defined in MeasureTheory/Base could be defined in Distributions directly, and hence all computations that you currently can perform in MeasureTheory would still be possible without you having to reason about base measures of distributions in Distributions.
I try to merge and release without too much delay, last month 12 PRs were merged and I released 5 new versions: https://github.com/JuliaStats/Distributions.jl/pulse/monthly
Very nice!
Sure, I know that it is does not cover all cases :slightly_smiling_face: My main point was just that the base measure heuristics that currently are defined in MeasureTheory/Base could be defined in Distributions directly, and hence all computations that you currently can perform in MeasureTheory would still be possible without you having to reason about base measures of distributions in Distributions.
Ok, this is sounding promising. I think there's still some more planning, maybe we should move discussion to the https://github.com/phipsgabler/LogDensity.jl repo @phipsgabler set up (thanks Philipp!)
I also want to be sure to get input from @mschauer, since he's been very active in MT and might catch some concerns I would miss :slightly_smiling_face:
Closing because moved to LogDensity.jl issue.
I haven't been tracking this very closely, but there's a
LogDensityProblems.logdensity
that's not usually user-facing from a PPL, andMeasureBase.logdensity
that's extremely user-facing (exported from Soss).Originally posted by @cscherrer in https://github.com/TuringLang/AbstractPPL.jl/issues/37#issuecomment-950356270
If this is to integrate with Soss, we'll need a plan so users don't have to qualify names all over the place.