TuringLang / Turing.jl

Bayesian inference with probabilistic programming.
https://turinglang.org
MIT License
2.02k stars 218 forks source link

Mixed integer MLE and MAP #1590

Closed mohamed82008 closed 2 years ago

mohamed82008 commented 3 years ago

Once mixed integer optimisation is added to Nonconvex.jl, it should be relatively easy to have mixed integer MLE and MAP in Turing. This will be a major feature compared to any other PPL. The main trouble is that Distributions.logpdf may complain if the value is not an integer and the support of the distribution is discrete. But this can be worked around by defining our own function or providing a kwarg to drop the check in Distributions. After I add ModelingToolkit support in Nonconvex, Nonconvex can even give you global optimality guarantees without the user having to provide any additional information.

devmotion commented 3 years ago

may complain if the value is not an integer and the support of the distribution is discrete.

You mean, it might complain if the value is not of type Integer? Discrete distributions support values of type Real, I changed this a while ago in Distributions. They are converted to Integers internally if needed and the log pdf outside of the support is -Inf but does not error (some discrete distributions such as Dirac even support any Real-valued value as support).

mohamed82008 commented 3 years ago

Cool, let's see. Even conversion to integer is not suitable for optimisation because we need differentiable arithmetic functions for the logpdf. So instead of an if statement something like x * l1 + (1 - x) * l2 should be used where x is binary and l1 and l2 are the logpdf values of 1 and 0 respectively.

devmotion commented 3 years ago

ifelse is differentiable (at least with Zygote).

mohamed82008 commented 3 years ago

Well I don't mean differentiable in the Zygote sense. I mean it needs to be defined and differentiable for non-binary values of x because most mixed integer optimisation algorithms rely on the continuous relaxation of integer variables.

devmotion commented 3 years ago

This means most distributions won't work, I assume. https://github.com/JuliaStats/StatsFuns.jl/pull/106 should fix some of the issues, but Distributions.jl will always return -Inf for the log pdf of values outside of the support, even if the function is implemented in a more general way that can deal with the relaxation, and not reach these definitions.

mohamed82008 commented 3 years ago

Yes which is why we may need different implementations which we can use in the MLE/MAP context.

torfjelde commented 3 years ago

Also, running the risk of stating the obvious here, there are several implementations which simply won't work with non-integer values, e.g. Categorical is the most obvious one. In these cases, surely you need to do something fancy which essentially comes down to defining a new distribution, to be able to work with a continous relaxation anyways?

Sounds cool though:)

mohamed82008 commented 3 years ago

Ya with categorical distributions, a one-hot encoding would make more sense because order is misleading. With a one-hot encoding, a continuous relaxation is also possible because it's just a weighted sum of the logpdfs.

devmotion commented 3 years ago

If one views discrete distributions with finite support as mixture of Dirac distributions with "one-hot weights", and then their relaxation as mixture of Dirac distributions with arbitrary weights in the probability simplex, wouldn't it be more naturally to use the logarithm of the weighted sum of the pdfs instead of the weighted sum of the logpdfs in the relaxation?

mohamed82008 commented 3 years ago

It doesn't matter as long as it's correct for the exactly one-hot case (not continuous). Using the weighted arithmetic mean of the logpdf means we are using the weighted geometric mean of the pdfs. So it makes sense either way.

devmotion commented 3 years ago

Sure, I know, my question was mainly what would be the most natural/intuitive relaxation? And from a probabilistic point of view it seemed more natural due to the interpretation as a mixture model.

mohamed82008 commented 3 years ago

The relaxation that's easier to optimise is the weighted sum of logpdfs because it's linear in the decision variables assuming the logpdfs are constants. Linear is better than nonlinear from an optimisation point of view.

devmotion commented 3 years ago

Haha yeah maybe I should have added that I did not care about practical issues and was only interested in mathematical "beauty" and intuition 😄

yebai commented 2 years ago

It looks like we don't need to fix anything from Turing's side. Once Nonconvex.jl supports mixed-integer optimisation, it should work fine.

mohamed82008 commented 2 years ago

It's already possible to do mixed integer optimisation in Nonconvex now. I should make a tutorial on mixed integer MLE and MAP in Turing.

yebai commented 2 years ago

It's already possible to do mixed integer optimisation in Nonconvex now. I should make a tutorial on mixed integer MLE and MAP in Turing.

It sounds like a nice tutorial to read!