JuliaMath / DensityInterface.jl

Interface for mathematical/statistical densities in Julia
Other
12 stars 3 forks source link

API Changes #4

Closed oschulz closed 2 years ago

codecov[bot] commented 2 years ago

Codecov Report

Merging #4 (885c390) into master (386596b) will increase coverage by 25.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##           master        #4       +/-   ##
============================================
+ Coverage   75.00%   100.00%   +25.00%     
============================================
  Files           1         2        +1     
  Lines          16        36       +20     
============================================
+ Hits           12        36       +24     
+ Misses          4         0        -4     
Impacted Files Coverage Δ
src/interface.jl 100.00% <100.00%> (+25.00%) :arrow_up:
src/interface_test.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 386596b...885c390. Read the comment docs.

oschulz commented 2 years ago

Closes #1, #3

oschulz commented 2 years ago

@devmotion and @phipsgabler, fine with you like this?

@cscherrer, could you take a quick look at my text regarding measures and MeasureTheory in the docs?

oschulz commented 2 years ago

Thanks for all the comments and suggestions, @devmotion. I hope I addressed/committed all of them, want to take a final look?

Also, should we move this to JuliaStats before releasing v0.3.0, so we don't need a v0.3.1 afterwards?

devmotion commented 2 years ago

Also, should we move this to JuliaStats before releasing v0.3.0, so we don't need a v0.3.1 afterwards?

We don't need to make a new release when it is moved to JuliaStats, unless there are any new changes in the actual implementation.

oschulz commented 2 years ago

We don't need to make a new release when it is moved to JuliaStats, unless there are any new changes in the actual implementation.

Well, it's always cleaner to have the link in the docs point at the actual location. :-)

mschauer commented 2 years ago

Let me understand this a bit. Do you plan to support densities of discrete distributions? If so, how do I know in a function receiving a density object what I am looking at?

oschulz commented 2 years ago

Let me understand this a bit. Do you plan to support densities of discrete distributions?

As @devmotion outlined in #1, the code added to distributions would basically look like this (adapted to the API changes in this PR):

DensityInterface.hasdensity(::Distribution) = true
DensityInterface.logdensityof(d::Distribution, x) = logpdf(d, x)

The question was indeed whether this should be restricted to ContinuousDistributions. I was a bit unsure myself, but after a recent conversation with @cscherrer I think that we can support discrete distributions as well, with a solid theoretical foundation: For a discrete distribution, the implicit base measure will be a counting measure, leading to a density with the values of the discrete PMF at x = 0, 1, 2, ... and zero elsewhere. Since this is already the current behavior of logpdf, all seems fine.

cscherrer commented 2 years ago

Hi @oschulz ,

As we've talked about, densities are always relative to some base measure, whether or not this is made explicit. In MeasureTheory, you can always ask about the basemeasure, and in Distributions you can look at the... I think it's called VariateType? Anyway, there's a type parameter.

A potential issue does come up if you want to be able to pass a generic function (like a likelihood, or a posterior with the base measure information thrown away) and have that treated as a density, namely that it's not clear where this base measure information comes from. This may be what @mschauer is referring to.

mschauer commented 2 years ago

Maybe even easier: the ability to represent discrete and continuous measures doesn’t help so much if I have no way of querying which of the two is given to my function.

oschulz commented 2 years ago

the ability to represent discrete and continuous measures doesn’t help so much if I have no way of querying which of the two is given to my function.

Very true. I guess it depends on the use case: With Bayesian inference, having a likelihood that's just defined with logdensityof wouldn't be a problem, since the prior distribution should/will provide that information. If the prior is a product, some dimensions may be continuous and other discrete, even. But even in such a case, I think it would be nice if we can calculate the log posterior density value via a simple generic logdensityof(likelihood, x) + logdensityof(prior, x). So supporting discrete distributions would be very useful for generic code (since prior would often be a Distribution). That doesn't mean the the inference algorithm won't need to look at other properties of the prior (most importantly it's support, resp. space it lives on) as well.

oschulz commented 2 years ago

As far as this PR is concerned - any objections to merging as is and tagging a new release (and continue the question of discrete distributions on the yet-to-do Distributions PR)?

We could think about adding another query-the-density function that yields information about the support/domain. I think that this should be optional though (returning missing) by default: In a Bayesian case, for example, it would be enough if the prior supports it. And many use cases will be happy with just logdensityof in general (e.g. because they live on a continuous space by definition).

So we could easily add a supportof/domainof or similar function later on. But then, that should probably go in a separate package anyway: it would be very useful in general, far beyond densities. This has been explored a bit (I know of DomainSets.jl), but to my knowledge there's no ecosystem-wide consensus or lightweight API definition yet (sure would be nice to have though).

mschauer commented 2 years ago
struct LogFuncDensity{F}
    _log_f::F
end

doesn't allow to define such a function which retrieves the measure _log_f is a log-density for or even only the support. Maybe you do want to warm up to the idea, that a function and a reference measure need to be packed together into a generic container/interface for densities, with one way out: to agree that the density is always continuous. But yeah, let's move on from this PR to a more global place to keep momentum.

devmotion commented 2 years ago

As far as I understand, LogFuncDensity is just the default fallback of logfuncdensity and probably the best/only thing you can do if the reference measure is not provided. You are free to implement logfuncdensity in a different way with information about the reference measure for functions f for which it can be obtained from f. Eg., you could add definitions for f::Base.Fix1{typeof(Distributions.logpdf),<:Distributions.Distribution} or f::Base.Fix1{typeof(MeasureTheory.logdensity),<:MeasureTheory.AbstractMeasure}.

mschauer commented 2 years ago

You approach is to pass densities around without the context information (the reference measure) which makes them meaningful objects, and without a clear idea how add that. You can do that, I just have sympathy with us for the pain that design decision will bring later on. It's not even clear to me what you do for scalar and multivariate densities.

oschulz commented 2 years ago

You approach is to pass densities around without the context information (the reference measure) which makes them meaningful objects, and without a clear idea how add that. You can do that, I just have sympathy with us for the pain that design decision will bring later on. It's not even clear to me what you do for scalar and multivariate densities.

DensityInterface.logdensityof is about establishing a standard name for a function that we already have as get_me_the_log_density(entity, x) in several packages (with different names), to increase interoperability/composability.

We can of course develop other traits that retrieve information like support/domain, etc. (and I think we should). But I think these should be separate, not all combined in a single function. Not only does it keep things more flexible to have them separate, the information is also typically not consumed at the same point. The support/domain of a function/density/something-that-has-a-density is a global property, so I'll usually only query it once. But logdensityof(d, x) I will typically have to call very frequently.

Take a sample-mean approach, as a very primitive example: I need the support/domain to instantiate the right kind process that will generate evaluation points. Then I'll generate a large set of them. Then I call logdensityof for each - I don't need domain/support information at that point.

mschauer commented 2 years ago

By analogy: It's like having only

issorted(v)

instead of

issorted(v, lt=isless, order::Ordering=Forward)

and if you call issorted and get true you would only know that it is "somehow sorted", but not according to which order/ direction and no way to find out.

cscherrer commented 2 years ago

By analogy...

Exactly. So in this analogy, the idea here seems to be that v should "know" its ordering. Is that right @oschulz ?

EDIT: So "v knows its ordering" maps to "logdensityof(d, x) means d knows its base measure, support, etc"

devmotion commented 2 years ago

I don't think this analogy is completely correct. As mentioned in some of the discussion, we need to be able to support both im- and explicit reference measures since we want to support use cases in e.g. AbstractPPL where the reference measure can't be stated or derived statically. And in many use cases it is just fine to work with an implicit or default reference measure (and not even reason about a reference measure).

However, the interface still allows you to specify lt and order - as part of v. E.g., by using basemeasure for AbstractMeasures or explicitly stating the involved measures in a RadonNikodymDerivative(mu, nu) density type.

cscherrer commented 2 years ago

AbstractPPL where the reference measure can't be stated or derived statically

Really interesting point, I'll need to think more about this

mschauer commented 2 years ago

I think issorted is actually better behaved than logdensityof because there is, by convention, a standard order in Julia, exactly lt=isless and order::Ordering=Forward, but there is no standard interpretation of logdensityof, it could be either log probability density function or log probability mass function.

oschulz commented 2 years ago

There's clearly a lot of potential for future additional traits here or in other packages, but as far as DensityInterface.logdensityof is concerned, any objections to tagging and releasing now?

I think once this sees more use in the wild, we'll certainly come up with more ideas anyway. :-)

oschulz commented 2 years ago

it could be either log probability density function or log probability mass function.

The idea behind logdensityof is that it will work for both (like Distributions.logodf) and that the base measure will be implied by the argument. So the application will either "not care" (everything continous, Lebesque measure), or infer it from the argument directly or via other trait-functions.

oschulz commented 2 years ago

So in this analogy, the idea here seems to be that v should "know" its ordering. Is that right @oschulz

Yes, that would be the idea.

oschulz commented 2 years ago

LGTM, thanks a lot @oschulz!

Likewise thanks to all of you! I happy that so many people got involved in this, so this can really become a "community standard" (within it's limitations, like the implicit base measure).

Maybe you can fix the two typos that I commented on?

Done. :-)

Otherwise I'm happy with it, and we can always improve and update later 🙂

I fully agree! I think we have something very usable now for a fairly large set use cases - certainly other use cases will need to go beyond a two-argument logdensityof or will require additional traits. It'll certainly not the the end of the story, but I hope this will provide an important step forward.

oschulz commented 2 years ago

The plan is to move this package to JuliaStats, so it can really be an open community package.

oschulz commented 2 years ago

Here we go: https://github.com/JuliaRegistries/General/pull/48157 :-)

@devmotion, shall I prepare a PR for Distributions, once v0.3.0 is registered?

devmotion commented 2 years ago

Sure, this would be good! 👍

oschulz commented 2 years ago

Ok, will do (I'll CC everyone here on the PR).

oschulz commented 2 years ago

Here we go: JuliaStats/Distributions.jl#1416