ReactiveBayes / RxInfer.jl

Julia package for automated Bayesian inference on a factor graph with reactive message passing
MIT License
260 stars 24 forks source link

A forecasting/predicting example would be useful #15

Closed SebastianCallh closed 10 months ago

SebastianCallh commented 1 year ago

Hi!

I read through your examples and let me say they are very nice. I specifically appreciate all the visualisations. However, I noticed there are no examples of forecasting/predicting on new data once the model is fit. The GP regression examples does forecasting by working with data with type Array{Union{Float64,Missing}} but I suspect this reduces performance.

What is the recommended way to run a fitted model on new data?

ismailsenoz commented 1 year ago

Hi @SebastianCallh ! It's good to hear that you are enjoying the examples. You are correct that there are no forecasting examples besides the GP regression example. Indeed, there is no unique solution to the forecasting problem within the RxInfer framework. Supplying missing values to the observations is just a way to obtain predictions. There is no recommended way to run a fitted model on new data as it heavily depends on the model, inference procedure, etc. However, we will upload more forecasting examples to illustrate the possible options and update the documentation accordingly. Thanks for pointing us to the issue.

I am curious about what insight makes you think that data with type Array{Union{Float64,Missing}} will reduce performance. We discussed it among ourselves but could not find out why this would reduce the performance. It would be great if you could share why you suspect a performance reduction so we can address this issue.

SebastianCallh commented 1 year ago

Thanks for the response @ismailsenoz . You are of course right in that the posterior predictions are problem dependent, particularly when it comes to plotting. In my experience it is fairly common for PPLs to offer utilities for posterior (and of course prior) predictive sampling so I guess that's where I'm coming from with my question. Without that one might have to re-implement the mechanism of the model outside of the @model function which of course duplicates work.

About the performance impact of Array{Union{Float64,Missing}}: I am not an expert on the Julia compiler but I am fairly certain Union{Float64,Missing} causes each value in the array to be boxed, so when performing operations on them the code has to follow pointers everywhere. See the example below for what I mean. Of course, I have not checked the RxInfer source code so perhaps this does not apply here.

using BenchmarkTools

julia> a = vcat(missing, rand(9999));

julia> b = rand(10000);

julia> @btime sum(a)
  16.372 μs (0 allocations: 0 bytes)
missing

julia> @btime sum(skipmissing(a))
  8.686 μs (7 allocations: 112 bytes)
4982.93969474356

julia> @btime sum(b)
  983.278 ns (1 allocation: 16 bytes)
4966.982509542625
ismailsenoz commented 1 year ago

Thanks for illustrating your concern and warning us about a potential performance issue. Currently, ReactiveMP (the inference engine of RxInfer) does not allow any operation in case a missing value occurs. Also, the message update rules for the factor nodes in the models need to be extended to return missing, as done in the GP regression example. Your point is valid in the case a user defines a rule that involves Array{Union{Float64,Missing}} in such a way that there is an operation on Array{Union{Float64,Missing}}. Then there will be a performance decrease, as you pointed out. By default, in ReactiveMP, the rules involving missing values are not present. For RxInfer, the appeal of passing Missing is that it allows the unification of learning, hyper-parameter tuning, and prediction in a single inference function call. I am sure we can come up with benchmarks and better implementation of handling missing in case we encounter performance degradation @bvdmitri @HoangMHNguyen.

We will keep this as an issue regarding the utilities for posteriors and try to address it.

SebastianCallh commented 1 year ago

Thank you for explaining, and for your work on RxInfer!

albertpod commented 11 months ago

The technicality of the issue was addressed in #51; however, the example for predictions is still missing.

albertpod commented 10 months ago

Closed by #184