Open DominiqueMakowski opened 1 year ago
One solution is to parse the DataFrame within the model specification function to create a NamedTuple. This also makes calling sample
a bit cleaner.
@model function model(data; n_col = size(unique(data.condition))[1])
data = (choice = data.choice, rt = data.rt)
min_rt = minimum(data[2])
ν ~ filldist(MvNormal(zeros(2), I * 2), n_col)
A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
k ~ truncated(Normal(0.2, 0.2), 0.0, Inf)
τ ~ Uniform(0.0, min_rt)
data ~ LBA(; τ, A, k, ν)
return (; data, ν, A, k, τ)
end
sample(model(data), Prior(), 5000)
Also, note that filldist
is used to draw the two drifts from MVN for each of the two conditions.
Now, I know that does not really answer your question. But looking into this generated some thoughts/comments
Two things: when using Turing, I have not encountered an instance where the distribution returns a NamedTuple of size two (both choice and RT) and you would supply two (e.g. choice, rt ~ ...)
Secondly, while I have little experience creating custom distributions that can be used in Turing, the LBA.jl file seems to be missing some required implementations of a set of methods, some of which are not present from my reading. Doing so can lead to similar errors see: link to relevant thread.
let me know what you think.
Thanks!
the LBA.jl file seems to be missing some required implementations of a set of methods
Couldn't we open an issue/PR in SequentialSamplingModels (SSM) to add potentially missing methods? Do you have in mind some in particular?
Yeah! Sounds like we are on the same page. I noticed there is a suggested set in the Distribution.jl documentation. I'll open an issue for it.
I think it's going somewhere, the sampling works, but there's one critical piece of the puzzle missing: How to specify "predictors" in the model. Say I have the following base model:
using Turing
using SequentialSamplingModels
using Random
using LinearAlgebra
using DataFrames
using StatsPlots
using StatsModels
@model function model_lba(data)
data = (choice=data.choice, rt=data.rt)
min_rt = minimum(data[2])
# Priors
drift ~ filldist(MvNormal(zeros(2), I * 2), 2)
A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
k ~ truncated(Normal(0.2, 0.2), 0.0, Inf)
tau ~ Uniform(0.0, min_rt)
# Likelihood
data ~ LBA(; τ=tau, A=A, k=k, ν=drift)
return (; data, drift, A, k, tau)
end
(note that I'm not using symbols for parameters name because I expect this could cause issues when the output is exported say in R)
And my data is like that:
# Generate data with different drifts for two conditions A vs. B
Random.seed!(254)
data1 = DataFrame(rand(LBA(ν=[3.0, 2.0], A=0.8, k=0.2, τ=0.3), 500))
data1[!, :condition] = repeat(["A"], nrow(data1))
data2 = DataFrame(rand(LBA(ν=[1.0, 1.5], A=0.7, k=0.2, τ=0.3), 500))
data2[!, :condition] = repeat(["B"], nrow(data2))
data = vcat(data1, data2)
My current intuition is to have a first data-preparation step to get the input data as a model-matrix using StatsModels:
# Get input data
f = @formula(rt ~ 1 + condition)
f = apply_schema(f, schema(f, data))
X = modelmatrix(f, data)
1000×2 Matrix{Float64}:
1.0 0.0
1.0 0.0
1.0 0.0
⋮
1.0 1.0
1.0 1.0
1.0 1.0
_, predictors = coefnames(f)
2-element Vector{String}:
"(Intercept)"
"condition: B"
The question is: how to model for instance different drift rates according to this predictor data 🤔 I can see how it is done for a regular linear model, but here I am not sure at all, do we need to specify a linear formula in the LBA() likelihood call?
# Likelihood
data ~ LBA(; τ=tau, A=A, k=k,
ν=intercept .+ x * drift)
or something like that?
Based on this issue (https://github.com/itsdfish/SequentialSamplingModels.jl/issues/27), I realize I never pressed comment here.
For each parameter, to model trial-level effects, you would indeed want to specify the parameter as a linear combination. This allows you to interpret the DDM parameters, such as mean functions, within a regression-like framework.
For example, in the case of drift: ν = β_0 + x * β_1. Now you have a drift intercept, which characterizes a stimulus-independent average rate of evidence accumulation constant added, referred to as the drift criterion or drift bias.
However, in your case, that would not be applicable because you have a reference treatment.
So overall I think folks will still have to specify the model for now.
I think that LBA, because of its double-drift parameter, is a bit more tricky to set:
using Turing
using SequentialSamplingModels
using Random
using LinearAlgebra
using Distributions
using DataFrames
using StatsPlots
using StatsModels
# Generate data with different drifts for two conditions A vs. B
Random.seed!(254)
data1 = DataFrame(rand(LBA(ν=[3.0, 2.0], A=0.8, k=0.2, τ=0.3), 500))
data1[!, :condition] = repeat(["A"], nrow(data1))
data2 = DataFrame(rand(LBA(ν=[1.0, 1.5], A=0.7, k=0.2, τ=0.3), 500))
data2[!, :condition] = repeat(["B"], nrow(data2))
data = vcat(data1, data2)
# Format input data
f = @formula(rt ~ 1 + condition)
f = apply_schema(f, schema(f, data))
_, predictors = coefnames(f)
X = modelmatrix(f, data)
# Model
@model function model_lba(data; intercept=nothing, condition=nothing, min_rt=minimum(data.rt))
# Priors for auxiliary parameters
A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
k ~ truncated(Normal(0.2, 0.2), 0.0, Inf)
tau ~ Uniform(0.0, min_rt)
# Priors over coefficients
beta_intercept ~ Normal(0, 1)
beta_condition ~ Normal(0, 0.5)
# drift rate
drift = beta_intercept + beta_condition * condition
# drift ~ filldist(MvNormal(zeros(2), I * 2), 2)
# Likelihood
data ~ LBA(; τ=tau, A=A, k=k, ν=drift)
return (; data, drift, A, k, tau)
end
Should the priors over coefficient be filldist(MvNormal))
as well? and if so, how to include that in a linear formula?
@kiante-fernandez I thought I'd first ask you as it's probably a basic issue of Turing specification.
I am testing this tutorial, and try to modify it to first make it as if I had the data stored in a dataframe. And I thought, similarly to other Turing models, to modify the input to not be a tuple of data but two vectors.
I tried the following:
I suspect this is because
LBA()
returns a tuple. But I'm not sure what to do here, I tried things like having:but it doesn't seem to be the solution. Any thoughts?