Closed palday closed 3 years ago
Okay I've updated this to use ColumnTable (e.g. NamedTuple of Vectors) instead of DataFrames but left the depedencies alone for now. However, from what I'm seeing in the output of the example something seems off, all the output values are the same:
20×5 DataFrame
Row │ x y err lower upper
│ Int64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────
1 │ 1 0.21417 0.242361 -0.0281918 0.456531
2 │ 2 0.21417 0.242361 -0.0281918 0.456531
3 │ 3 0.21417 0.242361 -0.0281918 0.456531
4 │ 4 0.21417 0.242361 -0.0281918 0.456531
5 │ 5 0.21417 0.242361 -0.0281918 0.456531
6 │ 6 0.21417 0.242361 -0.0281918 0.456531
7 │ 7 0.21417 0.242361 -0.0281918 0.456531
8 │ 8 0.21417 0.242361 -0.0281918 0.456531
9 │ 9 0.21417 0.242361 -0.0281918 0.456531
10 │ 10 0.21417 0.242361 -0.0281918 0.456531
11 │ 11 0.21417 0.242361 -0.0281918 0.456531
12 │ 12 0.21417 0.242361 -0.0281918 0.456531
13 │ 13 0.21417 0.242361 -0.0281918 0.456531
14 │ 14 0.21417 0.242361 -0.0281918 0.456531
15 │ 15 0.21417 0.242361 -0.0281918 0.456531
16 │ 16 0.21417 0.242361 -0.0281918 0.456531
17 │ 17 0.21417 0.242361 -0.0281918 0.456531
18 │ 18 0.21417 0.242361 -0.0281918 0.456531
19 │ 19 0.21417 0.242361 -0.0281918 0.456531
20 │ 20 0.21417 0.242361 -0.0281918 0.456531
Ah I see what's happening, for some reason the Set
created from the reference
was not comparing equal to the Set
created from allcols.
I think I see now why you have to match on columns, because there isn't a standarized way to get the original formula from the fitted model object. But maybe we can add something like that? I think matching the formula terms themselves might make things a bit easier conceptually, but then again it could make getting "typical" values for the columns for categorical predictors trickier...
"Typical" on categorical terms actually matches the behavior in the R-ecosystem / original effects proposal: it's the weighted average of the individual contrasts / weighted average of the effects from different levels.
I had a proposal in MixedModels for keeping track of the formula better ... but it's hard to do this in a meaningful way without keeping a copy of the original (tabular) data around. (And you can actually construct a MixedModel without ever using tabular/tidy data, but it's a pain.)
I think this also hits at one of my other good intentions: giving GLM.jl some love and making the user-facing types and show
methods nicer and more consistent.
FYI need an explicit dependency on StatsBase since StatsModels doesn't re-export StatsBase stuff (https://github.com/JuliaStats/StatsModels.jl/issues/212)