Change _plogpdf function to accept four arguments

bvdmitri commented 2 months ago

Could you please provide a description for the PR? At first glance, it’s not clear why this change is necessary. Additionally, please include tests that were failing before this change and are passing afterward. Thanks!

ismailsenoz commented 2 months ago

This is needed because many of the projections that are failing are due to presence of factorial, gamma, beta etc. type of functions in the log-partition and base measure. Because these functions are numerically challenging tests in ExponentialFamilyProjections either fail or display strange behavior such as extra factor of 2 when projecting into Chisq. I noticed a strange behavior when writing tests for projecting into Poisson. When the lambda parameter of Poisson is large, due to presence of factorial in the basemeasure, projection is wrong. When the projection is done with unnormalized density these problems are solved. Therefore, the test suit in ExponentialFamilyProjection needs to be implemented with _plogpdf removing all the basemeasure and log-partition factors. In the current implementation of _plogpdf only logpartition is excluded. By excluding basemeasure we can correctly project into Chisq and Poisson. This why it is needed to include the 4th argument.

Nimrais commented 2 months ago

@ismailsenoz @bvdmitri I will add a small test for this functionality

ismailsenoz commented 2 months ago

AFAIK none of the tests in ExponentialFamily were failing and there is no explicit tests for _plogpdf functions beforehand anyways. This function was implicitly tested with all logpdf and pdf calls.

Nimrais commented 2 months ago

Yeah, you are right, but if it is supposed to be used somewhere, this interface from a hidden optimisation for big containers becomes functionality.

_plogpdf was initially added just to faster compute logpdf for big containers.

bvdmitri commented 2 months ago

I see, that clarifies it for me. Your description makes sense, and to avoid any confusion in the future, it would be beneficial to open a PR with a comprehensive explanation of the change. It would also be helpful to mention that the existing tests should cover this modification.

However, I’m still unclear about how this change would assist with projection, as we don’t seem to call logpdf or _plogpdf anywhere during the projection iterations. Could you elaborate on the exact suggestion for ExponentialFamilyProjection?

Perhaps you refer to the logpdf which is used in the tests inside the f function, but it’s important to note that in general the function f, which is passed to the project_to method, is arbitrary, and we cannot manipulate or adjust what it does internally.

Nimrais commented 2 months ago

It's assisting from the perspective of the target function.

If you try to project x -> logpdf(Poisson, x) where the Poisson distribution has a very large natural parameter, the projection fails to recover the correct value. The base measure has factorial in its computations, and small and large natural parameters become indistinguishable.

However, if you project x -> log(x) * η, it works fine and correctly recovers the η parameter.

The proposal is to use unnormalized densities as the target function inside projection procedures for tests to make them more stable, at least for the distributions where to compute logpdf is numerically problematic such as Chisq, Poisson, Gamma.

ismailsenoz commented 2 months ago

The function in the test setups for exponential family projection test_projection_convergence includes the following line targetfn = let distribution = distribution (x) -> logpdf(distribution, x) end. For some distributions such as normals this line will work and actually improve the projection because the basemeasure of normal is just a constant. However , for Poisson, Chisq et this line will result the tests to fail because the logpdf will include factorial and the tests with large parameters will not pass.

It is not related to the method whatsoever. It is just related to the tests.

Nimrais commented 2 months ago

Perhaps you refer to the logpdf which is used in the tests inside the f function, but it’s important to note that in general the function f, which is passed to the project_to method, is arbitrary, and we cannot manipulate or adjust what it does internally.

Yes, we unfortunately can not manipulate what user provides us, but we can decide what we are testing. And @ismailsenoz point is that it does not make much of a sense to test for Poisson or Chisq projection of them trough their logpdf, it just gives unreasonable values.

bvdmitri commented 2 months ago

The proposal is to use unnormalized densities as the target function inside projection procedures for tests to make them more stable

I understand the proposal for the tests, but how would you practically implement this during actual inference? In practice, f is arbitrary and can perform various operations internally, such as calling logpdf for a Poisson distribution. Since we cannot manipulate f, this doesn’t seem like a real solution for practical scenarios, even if it makes tests pass.

it does not make much of a sense to test for Poisson or Chisq to project them trough their logpdf, it just gives unreasonable values.

But this is precisely what will happen during real inference.

but we can decide what we are testing

True, but we shouldn’t simplify our tests just to ensure they pass. There are many ways to set up tests—for example, adjusting stepsize to a small value and increasing the number of iterations to an unreasonable large value, or even using BigFloat for better stability. In my opinion, tests should reflect real-world conditions closely and not just aim to pass.

If tests for Poisson only pass when we use a very specific usage of _plogpdf, but we cannot realistically ensure this in any practical scenario, then the computation is fundamentally flawed. A passing test would only create a false sense of stability.

ismailsenoz commented 2 months ago

The problem of the test suit is it is trying to check whether the projection gets finer with increasing number of samples and iterations. In order to test this property properly we need to ensure that the projection is correct.

In practice we can not control what kind of function is passed. But in practice no one is going to try projecting normal into normal or poisson into poisson as well. So the fact that some of projection tests passing doesn't mean they are useful neither. If someone tries to project unnormalized density involving factorials or gamma it is a problem but that is not the case often.

True, but we shouldn’t simplify our tests just to ensure they pass. There are many ways to set up tests—for example, adjusting stepsize to a small value and increasing the number of iterations to an unreasonable large value, or even using BigFloat for better stability. In my opinion, tests should reflect real-world conditions closely and not just aim to pass.

We are not trying to simplify the tests. We are actually trying to show that these tests are not really testing what you intend them to test unless you ensure that the projection is correct. I do not think there is a remedy for the generic case. If a user is really interested in passing target functions that involve numerically unstable subroutines we can not do much about that.

Nimrais commented 2 months ago

I understand the proposal for the tests, but how would you practically implement this during actual inference? In practice, f is arbitrary and can perform various operations internally, such as calling logpdf for a Poisson distribution. Since we cannot manipulate f, this doesn’t seem like a real solution for practical scenarios, even if it makes tests pass.

It depends on the use case. In ReactiveMP, we can use that for example if our node is inside the exponential family, we will use _plogpdf to send a message with fixed log base measures and log partition function set to 0. In theory, these values should not change the result, so there is no reason to compute them.

For arbitrary use cases, there is no cure. If a user provides a numerically poor implemented target function, we cannot recover anything reasonable from it. However, we can include this example in our documentation to show users that it will work one way but not another. It's important to emphasize that this is not a problem with the method itself, but rather with the target function. Users should be careful about what they provide.

bvdmitri commented 2 months ago

Thanks for the explanations. It's good that you've observed this behaviour, and its great that you're testing this scenarios. I would also try to investigate (examples are below):

why using BigFloats does not help? (f6 scenario)
why removing just basemeasure or using a more stable logbasemeasure does not help? (f2 and f3 scenarios)
why removing logpartition but using a more stable logbasemeasure does not help? (f4 scenario, this is the most interesting case actually because the logbasemeasure is really a simple function and does not involve any factorials. It must be stable here really.)
why actually removing both non-constant terms from the function improves the approximation? this is extremely counterintuitive and I don't see why we can do that in principle. This modified function does not even represent Chisq anymore, why does it approach the actual hidden distribution? An intuitive explanation will help here.


import SpecialFunctions: loggamma

logbasemeasure(ef::ExponentialFamilyDistribution{Chisq}, x) = -x / 2

function logpdf_chisq(ef::ExponentialFamilyDistribution{Chisq}, x)
    η = getnaturalparameters(ef)
    _statistics = sufficientstatistics(ef, x)
    _logpartition = logpartition(ef, x)
    _logbasemeasure = logbasemeasure(ef, x)
    return _logbasemeasure + ExponentialFamily._scalarproduct(Chisq, η, _statistics) - _logpartition
end

function logpdf_chisq_without_logbasemeasure(ef::ExponentialFamilyDistribution{Chisq}, x)
    η = getnaturalparameters(ef)
    _statistics = sufficientstatistics(ef, x)
    _logpartition = logpartition(ef, x)
    return ExponentialFamily._scalarproduct(Chisq, η, _statistics) - _logpartition
end

function logpdf_chisq_without_logpartition(ef::ExponentialFamilyDistribution{Chisq}, x)
    η = getnaturalparameters(ef)
    _statistics = sufficientstatistics(ef, x)
    _logbasemeasure = logbasemeasure(ef, x)
    return _logbasemeasure + ExponentialFamily._scalarproduct(Chisq, η, _statistics)
end

function logpdf_chisq_without_logbasemeasure_and_logpartition(ef::ExponentialFamilyDistribution{Chisq}, x)
    η = getnaturalparameters(ef)
    _statistics = sufficientstatistics(ef, x)
    return ExponentialFamily._scalarproduct(Chisq, η, _statistics)
end

function logpdf_chisq_with_big_floats(ef::ExponentialFamilyDistribution{Chisq}, x)
    η = getnaturalparameters(ef)
    (η1,) = BigFloat.(ExponentialFamily.unpack_parameters(Chisq, η))
    _statistics = sufficientstatistics(ef, x)
    o = one(η1)
    _logpartition = loggamma(BigFloat(η1 + o)) + (η1 + o) * logtwo
    _logbasemeasure = -x / 2
    return _logbasemeasure + ExponentialFamily._scalarproduct(Chisq, η, _statistics) - _logpartition
end

dist = Chisq(100.0)
ef = convert(ExponentialFamilyDistribution, dist)

f1 = (x) -> logpdf(ef, x)
f2 = (x) -> logpdf_chisq(ef, x)
f3 = (x) -> logpdf_chisq_without_logbasemeasure(ef, x)
f4 = (x) -> logpdf_chisq_without_logpartition(ef, x)
f5 = (x) -> logpdf_chisq_without_logbasemeasure_and_logpartition(ef, x)
f6 = (x) -> logpdf_chisq_with_big_floats(ef, x)

project_to(ProjectedTo(Chisq), f1) = Chisq{Float64}(ν=50.251845186537395)
project_to(ProjectedTo(Chisq), f2) = Chisq{Float64}(ν=12.415111710651729)
project_to(ProjectedTo(Chisq), f3) = Chisq{Float64}(ν=13.682445714425805)
project_to(ProjectedTo(Chisq), f4) = Chisq{Float64}(ν=50.27226218442978)
project_to(ProjectedTo(Chisq), f5) = Chisq{Float64}(ν=99.33590564858314)
project_to(ProjectedTo(Chisq), f6) = Chisq{Float64}(ν=50.25184518653741)

Another question: Does this pattern repeat for all exponential family distributions that have a non-constant base measure?

Perhaps the issue is not directly related to the stability or instability of both terms but rather to the fact that the base measure is not constant, which could be influencing the projection process. Both Chisq and Poisson distributions fall into this category. If this is indeed the case, maybe we should consider adjusting the optimization function based on the specific characteristics of each projection. Just a thought.

I understand your point, and @Nimrais may be correct in suggesting that this approach works for certain nodes in ReactiveMP, though not universally. However, I have a feeling that there might be a larger issue here. It feels like we might be overlooking something or seeing a different bug. I’m just trying to verify our assumptions. I have this feeling because replacing log(basemeasure(ef, x)) with a more stable logbasemeasure(ef, x) and removing the unstable logpartition doesn’t actually improve the situation. Why? Additionally, using BigFloat doesn’t seem to improve anything either, which I would expect it to do to be honest.

bvdmitri commented 2 months ago

Extra thought, we actually use (gradient of) ‘logpartition’ term in the projection step, which is presumably unstable for certain distributions, but if we remove it only from logpdf we indeed get a correct estimate. But the projection still uses this term to compute the gradient. Which makes me think again that the problem is not in stability/instability but rather something more complex :/

bvdmitri commented 2 months ago

We still can merge this, maybe we would need it in the future, just saying. The tests are failing though

ismailsenoz commented 2 months ago

I can take a look at the failing tests after we merge logbasemeasure PR. And then we can merge perhaps.

bvdmitri commented 2 months ago

Looks like the tests are failing due to the changed API of Aqua.jl. They changed the kyword arguments in the new version

ReactiveBayes / ExponentialFamily.jl

Change _plogpdf function to accept four arguments #199