Open rikhuijzer opened 2 years ago
I've been thinking about a possible solution. We probably could add methods for sample
which take a GLMModel
. For example,
import Turing: sample
using MCMCChains: replacenames
struct GLMModel
model::DynamicPPL.Model
name_mapping::Dict{String,String}
end
function sample(gm::GLMModel, sampler, n_samples)
chns = sample(gm.model, sampler, n_samples)
updated_chns = replacenames(chns, gm.name_mapping)
return updated_chns
end
# And more additional methods for `sample`
[...]
From the user-side, things wouldn't change so much. In most cases, it would still be:
fm = @formula(...)
model = turing_model(fm, data)
chns = sample(model, NUTS(), 2_000)
Just loading off a random idea here, but what about keeping the predictor names in β
itself? E.g., a NamedArrays.jl solution:
julia> β = NamedArray(rand(3), ["a", "b", "c"], "Predictor")
3-element Named Vector{Float64}
Predictor │
───────────┼─────────
a │ 0.143747
b │ 0.253579
c │ 0.143526
Ideally we can build something on top of coefnames(::AbstractTerm)
which can do that substitution for you, although it might be a bit tricky since it looks like the intercept column is manually removed during construction.
Yes, a coefnames
would be great.
Intercept is always \alpha
. It was removed because brms
constructs the model as fast as possible for a MCMC NUTS sampler and this involves not using the 1
or 0
column in the model matrix.
We could do the same thing we do with the idx
for random-intercept:
julia> using TuringGLM
julia> cheese = CSV.read(download("https://github.com/TuringLang/TuringGLM.jl/raw/main/data/cheese.csv"), DataFrame);
julia> f = @formula(y ~ (1 | cheese) + background);
julia> m = turing_model(f, cheese);
The idx are Dict{String1, Int64}("B" => 2, "A" => 1, "C" => 3, "D" => 4)
Currently, the posterior parameters have the form
β[1], ..., β[n]
whereas formulas used something along the lines ofy ~ a + b + c
. This is now tricky to link again because I don't know without looking into the code what parameter belongs to what predictor.