TuringLang / AbstractPPL.jl

Common types and interfaces for probabilistic programming
http://turinglang.org/AbstractPPL.jl/
MIT License
27 stars 7 forks source link

`logdensity` name collision #38

Closed cscherrer closed 3 years ago

cscherrer commented 3 years ago

I haven't been tracking this very closely, but there's a LogDensityProblems.logdensity that's not usually user-facing from a PPL, and MeasureBase.logdensity that's extremely user-facing (exported from Soss).

Originally posted by @cscherrer in https://github.com/TuringLang/AbstractPPL.jl/issues/37#issuecomment-950356270

If this is to integrate with Soss, we'll need a plan so users don't have to qualify names all over the place.

devmotion commented 3 years ago

One option would be to merge the functions into a single logdensity function that could be defined in an otherwise empty LogDensity package.

To avoid that users have to qualify names, a simpler solution would be though to only export one of the two functions in Soss as logdensity, and at most an alias for the other one. The choice would depend on what function is more likely to be used.

yebai commented 3 years ago

Maybe we can remove all exports from AbstactPPL since it’s an infrastructure package?

devmotion commented 3 years ago

Maybe we should. I believe it's not completely relevant for the problem here though since I assume that users would not load AbstractPPL.

To me it seems the main question is how eg. Soss could give users access to both AbstractPPL.logdensity and MeasureBase.logdensity at the same time. Internally, in Soss one could use aliases or qualify all occurrences but it is not possible to export both functions. Additionally, if Soss would export AbstractPPL.logdensity and users load Soss + MeasureBase/MeasureTheory, then they would have to qualify all uses of logdensity.

cscherrer commented 3 years ago

Maybe at the call we should talk about the semantics of these functions. MeasureBase.logdensity is a log-density in the mathematical sense. It evaluates the Radon-Nikodym derivative at a point, with the base measure defaulting to the basemeasure of whatever measure you're working with. It's called logdensity to avoid name collision with logpdf, and because the objects we evaluate it on aren't necessarily distributions, but any measure. I've also defined basemeasure for Distributions, so logdensity defaults to logpdf on those.

phipsgabler commented 3 years ago

My motivations for choosing the name logdensity were

  1. to avoid the IMHO artificial distiction between "PDFs" and "PMFs",
  2. to use an appropriate term for densities which are not probability densities, like (unnormalized) posteriors and likelihoods, and
  3. to have a generic hypernym covering loglikelihood, logprior, etc., which are all expressible in terms of logdensity of differently conditioned models.

That agrees pretty much with @cscherrer's intentions; besides the fact that we don't use measure terminology anywhere, the measure theoretic usage is what I actually meant. So ideally, they are the same, and our usage should be compatible with logdensity of MeasureTheory.jl.

(There's the problematic aspect that all this is done without ever specifying wrt. what base measure the density is meant to be taken -- but that problem goes beyond this discussion.)

I'd be happy for us all (APPL, MT, and LogDensityProblems) to use a shared function, but didn't want inflict any changes on anyone. LogDensityProblems seemed like the place where this should live, but I didn't want to depend on the additional AD and transformation stuff there. The problem we have to solve is on of package infrastructure design, where I am too inexperienced.

cscherrer commented 3 years ago

Thanks @phipsgabler . I mostly agree with this, but some nit-picks might hopefully help in making the discussion more precise...

1. to avoid the IMHO artificial distiction between "PDFs" and "PMFs",

I agree this is artificial. Distributions uses PDF for everything; this is formally correct, because a discrete distribution has a density over counting measure.

2. to use an appropriate term for densities which are not probability densities, like (unnormalized) posteriors and likelihoods, and

Need to be a little careful here. A likelihood is only a density if the parameter space has an implicit base measure. Generally, it's just a function. In MeasureTheory, we work with likelihoods as things that act on measures, so there's an operation (pointwise product, ) taking a measure and a likelihood, and returning a measure.


(There's the problematic aspect that all this is done without ever specifying wrt. what base measure the density is meant to be taken -- but that problem goes beyond this discussion.)

Right. Fortunately, at least in cases I know of, the base measure for Distributions can be determined by the type, up to some details like size for dynamically-shaped arrays. For more complex situations, things aren't really expressible without MeasureTheory anyway :)

I'd be happy for us all (APPL, MT, and LogDensityProblems) to use a shared function, but didn't want inflict any changes on anyone.

Thanks, I agree

LogDensityProblems seemed like the place where this should live, but I didn't want to depend on the additional AD and transformation stuff there.

I don't think it should live there. IMO the emphasis there is on Problems - it's mostly a very opinionated API for expressing problems, more so than for more generic computations on log-density functions. Very useful for its purpose, but too narrow in scope to use as a basis for an entire ecosystem.

The problem we have to solve is on of package infrastructure design, where I am too inexperienced.

And too modest ;)

In case it's helpful, I had a problem with how to work with Distributions.logpdf. I wanted

I think this worked out pretty well... In MeasureTheory, I have

import MeasureBase
import Distributions: logpdf, pdf
export pdf, logpdf
Distributions.logpdf(d::AbstractMeasure, x) = MeasureBase.logpdf(d, x)
Distributions.pdf(d::AbstractMeasure, x) = exp(Dists.logpdf(d, x))
devmotion commented 3 years ago

The problem we have to solve is on of package infrastructure design, where I am too inexperienced.

IMO the cleanest and most lightweight solution if these different logdensity functions should be unified would be to define a logdensity stub in a separate package without any dependencies, probably without any concrete definition (maybe something like density(args..) = exp(logdensity(args...)) if it seems useful). This a common approach and would e.g. follow the example of other packages such as CommonSolve, InverseFunctions, RealDot, StatsAPI etc.

One option would be to merge the functions into a single logdensity function that could be defined in an otherwise empty LogDensity package.

phipsgabler commented 3 years ago
1. to avoid the IMHO artificial distiction between "PDFs" and "PMFs",

I agree this is artificial. Distributions uses PDF for everything; this is formally correct, because a discrete distribution has a density over counting measure.

Oh, absolutely, it is formally correct; but I was thinking of the (apprearently not so small) number of people who seem to insist on PMFs being a wholly different concept from "continuous" "densities", and not "just another Radon-Nikodym derivative wrt. a different base measure". In other words, "PDF" can be a loaded term, and using "density" consistently circumvents that without becoming unclear for anyone or inventing new words.

cscherrer commented 3 years ago

A stub could make sense as long as we can agree on a common API. I guess my biggest concern here is making the three-argument form available to Distributions.

devmotion commented 3 years ago

I don't think one should enforce three arguments as both Distributions (as logpdf) and AbstractPPL (as logdensity) only use two arguments. Maybe the two-argument version should the default in fact, with logdensity(measure, x) as the general API. This would match both the use in Distributions and AbstractPPL - in the former case the measure is specified as a Distribution, in the latter by a probabilistic model. And evaluation is done at a point x in the support in Distributions and a trace of the probabilistic model in AbstractPPL.

phipsgabler commented 3 years ago

"Three argument form" is with the base measure being explicit, right?

So we'd have those variants:

  1. logdensity(μ, x): μ describes the measure, the base measure is implied. Examples:
    • μ is a Distributions distribution, x within its support;
    • μ is an AbstractMeasure (or duck-typed measure), x within its support;
    • μ is a PPL model, x a trace-like data structure
  2. logdensity(μ, ν, x): μ and ν are AbstractMeasures (or duck-typed measure), x within their support.

I think it would mostly suffice to specify the API as in (1), with a loose relation of "measury part" and "value part" for whatever you are dealing with, and the idea that the base measure in these cases is implied. The latter can be made explicit by the form (2) in case an implementation supports that.

cscherrer commented 3 years ago

That's mostly correct @phipsgabler. There's no requirement for x to be in the support. If it's outside, logdensity will just return -Inf. This is important, because otherwise you end up throwing errors and trying to catch them, which is messy and has lots of overhead. And for example, it wouldn't make sense for logdensity(Dirac(x), y) to error outside of x == y.

Measures are defined using the two-argument form. In this case the base measure is "implied" in the sense that it's taken to be given by basemeasure. It turns out to be much more efficient to have the three-argument form defined in terms of the two-argument form.

Disallowing the three-argument form for Distributions will make many computations awkward, and make Distributions mostly unusable with MeasureTheory. I don't think that's really a viable option.

phipsgabler commented 3 years ago

Good point about the support. Then part of the contract should then be something like

logdensity should be well defined over the extension of μ's support to the whole range of the type of the compatible x value, returning an equivalent of -Inf outside the support (for meaningful values, ignoring 'undefined' cases like NaN)".

I don't get the requirement for Distributions, though. Do you imagine them to adapt the logdensity inteface? Or is that only referring to the glue code? Where will that glue code live, and how will it be defined? Just basemeasure dispatching on ValueSupport and VariateForm?

cscherrer commented 3 years ago

Then part of the contract should then be something like...

Yes that's right :)

The Distributions thing is about being able to use Distributions as measures. You can take logdensity to use Distributions.logpdf, and basemeasure is pretty easy to define based on the types. We want to be able to use Distributions with the combinators from MeasureTheory. That requires the three-argument form, which we define generically in terms of logdensity and basemeasure.

cscherrer commented 3 years ago

Maybe the simplest solution is if MeasureBase "owns" the three-argument form, providing a default implementation. This could be overridden in other packages, but would be expected to follow the same semantics.

devmotion commented 3 years ago

MeasureBase could define it only for the types it owns, otherwise it would be type piracy.

cscherrer commented 3 years ago

I think you mean "should"? Avoiding type piracy is generally a good idea, though there are certainly exceptions.

Currently it's not type piracy, because MeasureBase owns logdensity. I'm trying to find ways to share the name without breaking things. Do you have other suggestions?

phipsgabler commented 3 years ago

I went ahead and just set up this: https://github.com/phipsgabler/LogDensity.jl.

The piracy issue is a problem indeed -- where should glue code go...? Is there a way to solve that with Requires?

devmotion commented 3 years ago

No, I really think we should try to avoid type piracy as much as possible. In my experience it just causes too many problems in the long run.

I also think Requires should be avoided. It's a hack that causes compilation issues and increases loading times. IMO an interface package with Requires is not lightweight anymore.

devmotion commented 3 years ago

IMO the correct solution here is to define the following in LogDensity:

function basemeasure(x) end

function logdensity(measure, x, basemeasure) end
logdensity(measure, x) = logdensity(measure, x, basemeasure(measure))
devmotion commented 3 years ago

IMO the correct solution here is to define the following in LogDensity:

function basemeasure(x) end

function logdensity(measure, x, basemeasure) end
logdensity(measure, x) = logdensity(measure, x, basemeasure(measure))

Probably we should define

function logdensity(measure, x, basemeasure)
    logdensity(measure, x) + logdensity(LogDensity.basemeasure(basemessure), x, basemeasure)
end

Then one could just define the two-argument version and basemeasure.

cscherrer commented 3 years ago

This is backwards. Most measures are defined in terms of the two-argument form. The three-argument form is defined using that and basemeasure. We'll almost certainly refine the implementation of this, but currently it's

@inline function logdensity(μ::AbstractMeasure, ν::AbstractMeasure, x)
    α = basemeasure(μ)
    β = basemeasure(ν)

    # If α===μ and β===ν, The recursive call would be exactly the same as the
    # original one. We need to break the recursion.
    if α===μ && β===ν
        @warn """
        No method found for logdensity(μ, ν, x) where
        typeof(μ) == $(typeof(μ))
        typeof(ν) == $(typeof(ν))

        Returning NaN. If this is incorrect, please add a method        
        logdensity(μ::$(typeof(μ)), ν::$(typeof(ν)), x)
        """
        return NaN
    end

    # Infinite or NaN results occur when outside the support of α or β, 
    # and also when one measure is singular wrt the other. Computing the base
    # measures first is often much cheaper, and allows the numerically-intensive
    # computation to "fall through" in these cases.
    # TODO: Add tests to check that NaN cases work properly
    ℓ = logdensity(α, β, x)
    isfinite(ℓ) || return ℓ

    ℓ += logdensity(μ, x)
    ℓ -= logdensity(ν, x)

    return ℓ
end

There are more details on this in the paper, https://github.com/cscherrer/MeasureTheory.jl/blob/paper/paper/paper.pdf

devmotion commented 3 years ago

This is backwards. Most measures are defined in terms of the two-argument form. The three-argument form is defined using that and basemeasure.

Isn't this what I suggested in my second comment?

To add first-class support for measures with implicit but undefined basemeasure one could define (putting everything together):

basemeasure(x) = nothing

logdensity(measure, x) = logdensity(measure, x, basemeasure(measure)
function logdensity(measure, x, basemeasure)
    return logdensity(measure, x) + logdensity(LogDensity.basemeasure(measure), x, basemeasure)
end
logdensity(_, _, ::Nothing) = error("logdensity is not defined for unspecified base measures")
logdensity(::Nothing, _, _) = error("logdensity is not defined for unspecified measures)
# resolve ambiguity error
logdensity(::Nothing, _, ::Nothing) = error("logdensity is not defined for unspecified measures and base measures")

density(measure, x) = exp(logdensity(measure, x))
density(measure, x, basemeasure) = exp(logdensity(measure, x, basemeasure))

Then AbstractPPL could just define logdensity(model, trace) and everything would work (and error) as expected. Distributions could define logdensity(dist, x) in the same way but probably more useful would be to define basemeasure(::ContinuousDistribution) = Lebesgue() and basemeasure(::DiscreteDistribution) = CountingMeasure() (which would have to be defined in LogDensity) such that different base measures could be used if desired. And MeasureTheory could just implement whatever versions and specializations it wants to define.

devmotion commented 3 years ago

BTW an alternative approach to (log) densities focused on the two-argument version is https://github.com/oschulz/DensityInterface.jl

cscherrer commented 3 years ago

Moving the parameter around to match how we use them, this is

basemeasure(x) = nothing

logdensity(measure, x) = logdensity(measure, basemeasure(measure), x)

function logdensity(measure, base, x)
    return logdensity(measure, x) + logdensity(LogDensity.basemeasure(measure), base, x)
end

logdensity(_, ::Nothing, _) = error("logdensity is not defined for unspecified base measures")
logdensity(::Nothing, _, _) = error("logdensity is not defined for unspecified measures")

# resolve ambiguity error
logdensity(::Nothing, ::Nothing, _) = error("logdensity is not defined for unspecified measures and base measures")

density(measure, x) = exp(logdensity(measure, x))
density(measure, base, x) = exp(logdensity(measure, base, x))

I don't think this logdensity(measure, base, x) method will work as expected, you'll basically need what I have in MeasureBase.

It sounds like you're suggesting all of this, and also Lebesgue, should be defined in LogDensity. I guess you'd put CountingMeasure in there too, to cover discrete distributions. And it would have to depend on Distributions in order to have ::ContinuousDistribution methods. So the result would basically be MeasureBase, just with a strong preference toward Distributions instead of Measures. I don't see how this helps.

devmotion commented 3 years ago

No, I don't suggest a dependency on Distributions. It should be an interface.AbstractPPL, Distributions etc could implement it in their package if they think it is useful. And the design should allow to just implement the two-argument version (possible with the suggestion). If one uses one of the simple standard measures Lebesgue and CountingMeasure as base measure, other packages such as MeasureTheory could perform computations with respect to other base measures as well - they just have to define density of these standard measure with respect to their custom measures (would allow to move the logdensity integration for Distributions in MeasureTheory/Base to Distributions). And MeasureTheory could define other measures, logdensity implementations etc.

cscherrer commented 3 years ago

No, I don't suggest a dependency on Distributions. It should be an interface.

Ok, this makes more sense. Thanks for clarifying.

Distributions etc could implement it in their package if they think it is useful.

This could be tricky, because it would take functionality away from MT until Distributions adopts the changes. Part of my motivation in moving away from Distributions was the difficulty in getting changes through, and I could see this being a big time sink.

So I'd probably need to keep things in MT until a Distributions release with the update.

If one uses one of the simple standard measures Lebesgue and CountingMeasure as base measure, other packages such as MeasureTheory could perform computations with respect to other base measures as well - they just have to define density of these standard measure with respect to their custom measures

That often doesn't work. For example, say you have a finite measure on a curve in ℝ². This is singular wrt Lebesgue measure on ℝ², and counting measure is singular wrt it. Things like this can be expressed using Bijectors, and to get the measure right you usually need to work with a pushforward of Lebesgue measure.

devmotion commented 3 years ago

Part of my motivation in moving away from Distributions was the difficulty in getting changes through, and I could see this being a big time sink.

I try to merge and release without too much delay, last month 12 PRs were merged and I released 5 new versions: https://github.com/JuliaStats/Distributions.jl/pulse/monthly

That often doesn't work. For example, say you have a finite measure on a curve in ℝ². This is singular wrt Lebesgue measure on ℝ², and counting measure is singular wrt it. Things like this can be expressed using Bijectors, and to get the measure right you usually need to work with a pushforward of Lebesgue measure.

Sure, I know that it is does not cover all cases :slightly_smiling_face: My main point was just that the base measure heuristics that currently are defined in MeasureTheory/Base could be defined in Distributions directly, and hence all computations that you currently can perform in MeasureTheory would still be possible without you having to reason about base measures of distributions in Distributions.

cscherrer commented 3 years ago

I try to merge and release without too much delay, last month 12 PRs were merged and I released 5 new versions: https://github.com/JuliaStats/Distributions.jl/pulse/monthly

Very nice!

Sure, I know that it is does not cover all cases :slightly_smiling_face: My main point was just that the base measure heuristics that currently are defined in MeasureTheory/Base could be defined in Distributions directly, and hence all computations that you currently can perform in MeasureTheory would still be possible without you having to reason about base measures of distributions in Distributions.

Ok, this is sounding promising. I think there's still some more planning, maybe we should move discussion to the https://github.com/phipsgabler/LogDensity.jl repo @phipsgabler set up (thanks Philipp!)

I also want to be sure to get input from @mschauer, since he's been very active in MT and might catch some concerns I would miss :slightly_smiling_face:

phipsgabler commented 3 years ago

Closing because moved to LogDensity.jl issue.