Provide metrics for claimed accuracy vs Turing.jl

alstat commented 1 year ago

Hi, this is in connection with JOSS#5161. I've reviewed the benchmark notebook provided. I can confirm that the RxInfer.jl is much faster than Turing.jl, but I'm not sure with the accuracy. While the plots in the README seems to suggest RxInfer.jl to be more accurate, subjectively, when I ran the notebook I found that it is not set to a specific seed, so I got a different plots (see below), and it seems RxInfer.jl and Turing.jl are both close in accuracy especially for Turing.jl's HMC(0.05, 10.0, 1000), subjectively. So, I suggest that you provide metric for measuring the accuracy for this claim, maybe RMSE or MAE would do. Also, is this something theoretically true? I'm just thinking if this may only be true for this example but for others both RxInfer.jl and Turing.jl maybe in the same range of accuracy after all, unless there is something in the theory that would suggest otherwise or in Turing.jl's implementation. In cases where it is theoretically supported, then maybe we emphasize briefly in the README or the doc on why it is more accurate (haven't read other parts of the documentation yet in case this is already emphasized), otherwise, we may caution that this is only true for the said example and must be tested for more examples or yet to be confirmed theoretically.

RxInfer.jl

Turing.jl

bvdmitri commented 1 year ago

The accuracy checks between different Bayesian inference are very hard to implement in reality. RxInfer implements variational Constrained Bethe Free Energy optimization and without extra constraints (the default behavior) it is equivalent to the well-known Belief Propagation (or Sum-Product) method, which is exact and guaranteed to be more accurate than HMC. In the beginning we note, however, that

As a result, in models with conjugate pairings, RxInfer.jl often beats general-purpose probabilistic programming packages in terms of computational load, speed, memory and accuracy.

and this is exactly the case of the model with conjugate pairings.

HMC is only asymptotically exact, which means it may be very accurate if you take an infinite number of samples, but it does not really make sense to perform such comparison because RxInfer gives result for this example in the scale of milliseconds, while even more or less comparable result from Turing takes seconds/minutes.

With extra constraints RxInfer generally still generates "more accurate" results than HMC and is still faster, e.g with structured factorization constraints or mean-field approximations. For more information and theoretical proof I would refer to this paper, which RxInfer essentially implements under the hood: https://www.mdpi.com/1099-4300/23/7/807. I also do comparisons with other models here https://arxiv.org/abs/2112.13251, with average MSE metric.

In a few words, yes, for conjugate models it is theoretically true, but in reality we cannot compare with Turing.jl on all non-conjugate models. Turing is generic probabilistic programming package, RxInfer is not generic (yet :)) and has a focus on a specific set of models. We support limited non-conjugacy, but we focus specifically on state-space models and do not support many models that Turing supports. It is also not about Turing.jl really, but about HMC and sampling-based methods in general. They may (and will) give very accurate results given infinite number of samples, but that practically not very helpful for applications we are aiming at (real-time inference on low-power devices).

It is interesting, however, that, indeed you get relatively fine results depending on different seed. I would argue that such a strong dependency on the seed is a very bad thing in practice, but of course we can fix a relatively good seed for HMC. The really nice property of the way we perform inference is that it is consistent, we don't really have any hyperparameters for this example (and for many other examples in the documentation). I chose some standard hyper-parameters for the HMC from Turing's documentation, because I'm not a big expert in the HMC hyper-parameters tuning.

The emphasis in the example in the README is to drag peoples attention and show fast inference for this specific example in conjugate state-space model. It turned out that the accuracy is also better, but there is no space really to dive into theoretical differences between HMC. We may give references to the papers of course or simply remove the accuracy claims.

If you think that this example may mislead other users, we may improve the wording in the README not to confuse other users that may think that we are always better than Turing and HMC. Would it be enough to improve wording and give references to the papers above? I don't think we have a lot of space in the README to dive into theoretical claims and differences with HMC (after all, HMC is not the only available method to compare with). I'm against removing the accuracy comparison/claims all together, because I can execute HMC with 1 sample and it will be "fast".

bvdmitri commented 1 year ago

Another reference is https://www.softwareimpacts.com/article/S2665-9638(22)00042-2/pdf and the repository with experiments https://github.com/biaslab/ReactiveMPPaperExperiments (that includes the model from the README plus two more)

alstat commented 1 year ago

Hi @bvdmitri, thank you for these details and the references! I agree that it is not a straightforward comparison, but as you suggest, we can just improve the wording in the README, maybe one more sentence to the current statement would do, something like:

RxInfer.jl not only beats generic-purpose Bayesian inference methods, executes faster, and scales better, but also provides more accurate results for various complex problems. This is especially true for conjugate models, see [add references here]. Check out our examples!

Or if you have other better wording.

bvdmitri commented 1 year ago

See #66

alstat commented 1 year ago

Thank you, @bvdmitri! Looks good to me.

ReactiveBayes / RxInfer.jl

Provide metrics for claimed accuracy vs Turing.jl #62

RxInfer.jl

Turing.jl