beacon-biosignals / Effects.jl

Effects Prediction for Regression Models
MIT License
19 stars 4 forks source link

Initial Functionality #1

Closed palday closed 3 years ago

kleinschmidt commented 3 years ago

FYI need an explicit dependency on StatsBase since StatsModels doesn't re-export StatsBase stuff (https://github.com/JuliaStats/StatsModels.jl/issues/212)

kleinschmidt commented 3 years ago

Okay I've updated this to use ColumnTable (e.g. NamedTuple of Vectors) instead of DataFrames but left the depedencies alone for now. However, from what I'm seeing in the output of the example something seems off, all the output values are the same:

20×5 DataFrame
 Row │ x      y        err       lower       upper    
     │ Int64  Float64  Float64   Float64     Float64  
─────┼────────────────────────────────────────────────
   1 │     1  0.21417  0.242361  -0.0281918  0.456531
   2 │     2  0.21417  0.242361  -0.0281918  0.456531
   3 │     3  0.21417  0.242361  -0.0281918  0.456531
   4 │     4  0.21417  0.242361  -0.0281918  0.456531
   5 │     5  0.21417  0.242361  -0.0281918  0.456531
   6 │     6  0.21417  0.242361  -0.0281918  0.456531
   7 │     7  0.21417  0.242361  -0.0281918  0.456531
   8 │     8  0.21417  0.242361  -0.0281918  0.456531
   9 │     9  0.21417  0.242361  -0.0281918  0.456531
  10 │    10  0.21417  0.242361  -0.0281918  0.456531
  11 │    11  0.21417  0.242361  -0.0281918  0.456531
  12 │    12  0.21417  0.242361  -0.0281918  0.456531
  13 │    13  0.21417  0.242361  -0.0281918  0.456531
  14 │    14  0.21417  0.242361  -0.0281918  0.456531
  15 │    15  0.21417  0.242361  -0.0281918  0.456531
  16 │    16  0.21417  0.242361  -0.0281918  0.456531
  17 │    17  0.21417  0.242361  -0.0281918  0.456531
  18 │    18  0.21417  0.242361  -0.0281918  0.456531
  19 │    19  0.21417  0.242361  -0.0281918  0.456531
  20 │    20  0.21417  0.242361  -0.0281918  0.456531
kleinschmidt commented 3 years ago

Ah I see what's happening, for some reason the Set created from the reference was not comparing equal to the Set created from allcols.

kleinschmidt commented 3 years ago

I think I see now why you have to match on columns, because there isn't a standarized way to get the original formula from the fitted model object. But maybe we can add something like that? I think matching the formula terms themselves might make things a bit easier conceptually, but then again it could make getting "typical" values for the columns for categorical predictors trickier...

palday commented 3 years ago

"Typical" on categorical terms actually matches the behavior in the R-ecosystem / original effects proposal: it's the weighted average of the individual contrasts / weighted average of the effects from different levels.

I had a proposal in MixedModels for keeping track of the formula better ... but it's hard to do this in a meaningful way without keeping a copy of the original (tabular) data around. (And you can actually construct a MixedModel without ever using tabular/tidy data, but it's a pain.)

I think this also hits at one of my other good intentions: giving GLM.jl some love and making the user-facing types and show methods nicer and more consistent.