Amortized Bayesian parameter estimation example for the leaky competing accumulator

ReactiveBayes / RxInfer.jl

Julia package for automated Bayesian inference on a factor graph with reactive message passing

MIT License

254 stars 24 forks source link

Amortized Bayesian parameter estimation example for the leaky competing accumulator #128

Closed itsdfish closed 1 year ago

itsdfish commented 1 year ago

Hello,

Thank you for putting together this package and the accompanying documentation. This package looks very nice!

I am moving a discussion with Dmitry and Abert from the Turing Slack channel to here upon their request. As some brief background, I am interested in performing amortized Bayesian parameter estimation similar to the Python package BayesFlow. Performing Bayesian parameter estimation with many of the models I work with is challenging because they do not have a closed-form likelihood function. Having capabilities similar to BayesFlow would be very useful. I suspect the broader Julia community would find it useful as well.

My ultimate goal is to perform amortized Bayesian parameter estimation on models with the following characteristics:

simulation-based models without a tractable likelihood function
prior distribution over multiple parameters (potentially with a hierarchical structure)
multivariate data with mixed discrete and continuous dimensions

In addition, it would be great to have the ability to save and reload a trained neural network, and to obtain full posterior distributions given a dataset with any number of observations

This example from BayesFlow is a good starting point which captures these criteria quite well. The example uses the leaky competing accumulator (LCA), a multi-alternative decision making model based on a stochastic differential equation. The basic idea is that evidence for different options (items from a restaurant menu) acculates towards a threshold. The winner of this race determines the decision and the response time, leading to a bivariate distribution with one discrete variable and one continuous variable.

Dmitry and Abert indicated that RxInfer has the capabilities listed above and were nice enough to offer help putting together an LCA example similar to the one from BayesFlow. If this is something that you find useful, I would be willing to open a PR to add to your list of examples. I started by studying the invertible neural network example. Machine learning is not my area of expertise. So some details are a little unclear to me. In this repo, I adapted the function generate_data to the LCA and the code runs. However, I think the neural network architecture needs to be modified. Currently, the training data are generated from a fixed set of parameters rather than samples from prior distributions. So I am guessing it cannot produce posterior distributions over parameters in its current form. I'm not sure where to start. Any help you would be willing to provide would be helpful.

albertpod commented 1 year ago

Hi @itsdfish! I will have another look this week; as for now, I must say that your problem involves several issues we are currently dealing with. We plan to add hierarchical prior for the neural networks, such as flows, etc. The out-of-the-box solution doesn't provide this functionality yet.

If you propose a model and you run into issues, we will gladly help. We don't have many resources to work closer on this one for now.

I will keep an issue so someone can pick it up if time permits.

itsdfish commented 1 year ago

No problem. I completely understand. I appreciate any time you are willing to devote.

In the meantime, if a non-hierarchical version is feasible, that would be helpful too. Thanks again!

albertpod commented 1 year ago

Hi @itsdfish! The thing is that from the RxInfer.jl perspective adding hierarchical priors is easy once the node, such as an invertible network, accepts parameters as distributions. Unfortunately, it's not the case for now. In about one-two week, I plan to make a PR that will allow us to parameterize the invertible networks. It might still not be what you are looking for, but the idea is smth like this:

x ~ MvNormal(priors)
A ~ MatrixNormal(M, U, V)
y = A*x
z = Flow(y)

In this sketch, you can think of A as the parameters of an invertible neural network.

albertpod commented 1 year ago

Hey @itsdfish ! Just a quick update.

Some developments on ReactiveMP.jl branch called transfominator might be helpful for you. It adds a ContiniousTransition node with a couple of aliases, CTransition and Transfominator. The node allows performing inference for linear transformations of a multivariate Gaussian y=Hx with unknown matrix H.

You can then combine it with Flow and other linearities through Delta factors.

    xₜ₋₁ ~ MvNormalMeanCovariance(μxₜ₋₁, Σxₜ₋₁)

    Λ ~ Wishart(nΛ, ΣΛ)

    h ~ MvNormalMeanCovariance(μh, Σh)

    xₜ ~ ContinuousTransition(xₜ₋₁, h, Λ) where {meta = CTMeta(in_dim, out_dim)}

    yₜ ~ MvNormalMeanCovariance(B*xₜ, Q)

Note that we pass h as a vectorized form of a matrix normal distribution to ContinuousTransition. So the dimensionality of h is in_dim*out_dim where in_dim and out_dim - are the dimensionality of xₜ₋₁ and xₜ respectively.

As was said previously, you can use this node with Flow or DeltaNode. You can stack them together one by one. Additionally, you can put a prior on the parameters of h as well, for example, InverseWishart for the covariance and random walk or autoregressive prior for h.

The main issue now is computational (if your problem works with 10-dimensional vectors then the covariance for h becomes 100x100 which won't be fast), that's why merging this functionality to the master branch will take a while.

itsdfish commented 1 year ago

@albertpod, thank you for the update! This looks really interesting. I upgraded my working example to the transfominator branch and started to update the function invertible_neural_network based on your description above. It became clear that there are some gaps in my understanding of the new workflow.

For example, I'm not quite sure how to generate the training data in generate_data for priors over two parameter alpha and tau. Should I add an outer loop that samples from the priors of alpha and tau like so? https://github.com/itsdfish/RxInferSandbox.jl/blob/16738534c2177891fde2bcc6bd4e19bf3e20eafd/examples/lca_example.jl#L33

In the function invertible_neural_network, I am not sure how to define μh and Σh in h ~ MvNormalMeanCovariance(μh, Σh). I think there are other factors that will depend on the form of the training data. I was wondering whether you might be able to provide some guidance or add a PR to the repo above so that we have a working example (which could of course be added to your documentation)?

albertpod commented 1 year ago

@itsdfish, so according to your generative model, you construct a latent space that consists of choice and rt, then you pass it through a likelihood ReactiveMP.forward() function, which will constitute your observations. Is that what you really want? Aren't your observations actually the output of your lca function?

Let's start simple (flow aside). Let's discuss the generative model (generate data function). What are your observations? Then we will construct a probabilistic model.

itsdfish commented 1 year ago

Thanks for your help with this. I agree that starting with the generative model is a good strategy. You are correct: the data generating model produces a choice and a reaction time from a single simulation.

To make things simple, I propose setting a prior on two of the parameters: alpha, which is the decision threshold, and tau, which is an additive constant for visual encoding and motor execution time.

using Distributions 
using SequentialSamplingModels

# prior sample for decision boundary 
α = rand(Uniform(0, 3))
# prior sample for encoding/motor response time 
τ = rand(Uniform(0, 0.50))

I'll assume the other parameters are deterministic. This leaves us with the following distribution object for the LCA:

# LCA model object with β, λ, ν, Δt and σ fixed, and α, and τ sampled from a prior distribution
dist = LCA(; α, β=0.20, λ=0.10, ν=[2.5,2.0], Δt=.001, τ, σ=1.0)

The function rand will sample a choice (indexed as 1 or 2) and a reaction time. I'm not sure how many samples we should use for each parameter set. So I will just assume 1 sample for now:

# a single sample from the model: choice, rt 
choice, rt = rand(dist)

In summary, the data generating model outputs a choice and a reaction time. In a simple form of the model, we could have a prior on alpha and tau.

albertpod commented 1 year ago

Thanks @itsdfish! I have a better feeling about what you are trying to achieve. So as far as I understand, you want to approximate the likelihood induced by LCA. I am no specialist on intractable likelihoods and their approximations, so my proposed solution might be different from what you need. First of all, I would say that you need to take out Reactive. forward(...) from your data generation function.

It appears that the inference problem here is $$p(α, τ| y_{1:N})$$ where yᵢ is an i-th pair [choice, rt], the problem here ofc is your intractable likelihood. I would proceed with the simple model like this at first:


ατconcat(α, τ) = vcat(α, τ) # we need this function to combine α and τ parameters and push it through the Flow

@model function invertible_neural_network(nr_samples::Int64, model)

    # initialize variables
    x     = randomvar(nr_samples)
    y_lat = randomvar(nr_samples)
    y     = datavar(Vector{Float64}, nr_samples)

    # specify prior
    α   ~ Normal(μ=3, σ²=10.0)
    τ   ~ Normal(μ=10, σ²=10.0)

    z_μ ~ ατconcat(α, τ) where {meta=Linearization()}
    z_Λ ~ Wishart(1e2, 1e4*diageye(2))

    # specify observations
    for i in 1:nr_samples

        # specify latent state
        x[i] ~ MvNormal(μ=z_μ, Λ=z_Λ)

        # specify transformed latent value
        y_lat[i] ~ Flow(x[i]) where {meta=FlowMeta(model)}

        # specify observations
        y[i] ~ MvNormal(μ=y_lat[i], Σ=tiny*diageye(2))

    end

end;

y = [[choice[i], rt[i]] for i in 1:length(choice)]

data = (y = y, )

constraints = @constraints begin
    q(z_μ, x, z_Λ) = q(z_μ)q(z_Λ)q(x)
end

fmodel         = invertible_neural_network(length(y), compiled_model)
initmarginals = (z_μ = MvNormalMeanCovariance(zeros(2), huge*diageye(2)), z_Λ = Wishart(2.0, diageye(2)))

# Inference routine

This is something to start with, and we can build up on top of it. Remember to experiment with the flow model itself. As the next step, you can try to change the prior for α, τ. For example, you may want to have a Beta prior for α? In this case, you'd need to use CVI instead of Linearization method I used for ατconcat.

BTW I don't think you need CTransition node here; I might mislead you; sorry for this.

itsdfish commented 1 year ago

You are correct. The inference problem is p(α,τ | y_1:N), where N might be of any size. A good resource for performing this type of inference is BayesFlow: Learning Complex Stochastic Models With Invertible Neural Networks, where they describe a method for using invertible networks with a summary network.

Thanks the guidance with invertible_neural_network, and no worries about CTransition. I updated my code accordingly. My basic understanding is that this network will allow us to learn the density of the posterior distribution of alpha and tau by starting with a base density (multivariate normal in this case) and performing a series of transformations. Assuming my understanding is correct, I am not sure why I would select a Normal or a Beta in invertible_neural_network. The most important constraint for the LCA is that alpha and tau are positive, but I am not sure how that relates to the prior in invertible_neural_network.

I was hoping to ask a few follow up questions too. First, I was wondering what the training data should be. Currently, generate_data has a "prior" loop where a the prior is sampled, and a "data" loop where samples are drawn from the model given the parameters sampled from the prior distributions. The output is an array in which the first row corresponds to choices and the second row corresponds to reaction times. I was wondering whether the neural network also needs the samples for alpha and tau?

Also, I was wondering whether you can recommend a resource for these types of models aimed at non-experts. I have been reading various sources, but it can be challenging to get a good understanding from the journal articles because they typically assume the reader is already an expert.

albertpod commented 1 year ago

Hi @itsdfish! I'm sorry for not getting back to you sooner.

I am not sure why I would select a Normal or a Beta in invertible_neural_network. The most important constraint for the LCA is that alpha and tau are positive, but I am not sure how that relates to the prior in invertible_neural_network.

Neither Normal nor Beta are good choices in this case. You'd better use Uniform prior or Gamma prior. Working with Gamma priors is easier in RxInfer atm.

Here's an example.

@model function invertible_neural_network(nr_samples::Int64, model)

    # initialize variables
    x     = randomvar(nr_samples)
    y_lat = randomvar(nr_samples)
    y     = datavar(Vector{Float64}, nr_samples)

    # specify prior

    α   ~ GammaShapeScale(100.0, 0.01)
    τ   ~ GammaShapeScale(100.0, 0.01)

    z_μ ~ ατconcat(α, τ) where {meta=CVI(StableRNG(42), 100, 200, Optimisers.Descent(0.1), RxInfer.ForwardDiffGrad(), 100, Val(true), true)}
    z_Λ ~ Wishart(3, diageye(2))

    # specify observations
    for i in 1:nr_samples

        # specify latent state
        x[i] ~ MvNormal(μ=z_μ, Λ=z_Λ)

        # specify transformed latent value
        y_lat[i] ~ Flow(x[i]) where {meta=FlowMeta(model)}

        # specify observations
        y[i] ~ MvNormal(μ=y_lat[i], Σ=tiny*diageye(2))

    end

    # return variables
    return z_μ, z_Λ, x, y_lat, y

end;

I know CVI meta looks horrifying, see this example to understand more.

Again, your RxInfer model should represent your belief of the data generation process. If alpha and tau are positive, then we can put a Gamma prior on top of it.

The output is an array in which the first row corresponds to choices and the second row corresponds to reaction times. I was wondering whether the neural network also needs the samples for alpha and tau?

I am not sure I understand the question here. Didn't you say we must infer the posterior over alpha and tau?

Also, I was wondering whether you can recommend a resource for these types of models aimed at non-experts.

Do you mean invertible nets? The way they work in factor graphs (engine behind RxInfer) can be found here: https://biaslab.github.io/publication/hybrid-inference-inn/

I will try to allocate some time to read that paper of BayesFlow, seems interesting. There are people in the lab who are working INNs but they are on holidays. Perhaps they will be more helpful than I am ;)