Open DominiqueMakowski opened 1 year ago
marginaleffects has its own datagrid function, so passing a grid is definitely possible.
The purpose of marginal means over get_predicted is if you want to marginalize over other predictors (eg, average mean in control group across ethnic groups). marginaleffects has several options for how to handle continuous variables when averaging
To begin, let me introduce a conceptual difference between two functions: predictions()
and marginalmeans()
.
predictions()
accepts a grid (newdata
argument) in the form of a data frame, and it makes predictions for each row of that data frame. When calling predictions()
, we can also use the by
argument to average-out or "marginalize" across some variables from the model.
marginalmeans()
is a handy shortcut function which does not accept a grid, but does three things under the hood:
predictions()
to make predictions on that gridIn almost all of the examples you show, there is no averaging across categorical variables going on in estimate_means()
, so we don't actually have to call marginalmeans()
. Calling predictions()
works just fine. At the bottom of this post I give code to replicate the results of all your examples.
Before that, I'll give a few examples of marginalmeans()
to illustrate the equivalences.
model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris)
mod <- lm(mpg ~ factor(am) + factor(vs) + factor(gear), data = mtcars)
# 1 call to `marginalmeans()` is equivalent to 3 calls to `estimate_means()`
marginalmeans(mod)
estimate_means(mod, at = "am")
estimate_means(mod, at = "vs")
estimate_means(mod, at = "gear")
# Marginalize across values of `gear`. note the `interaction` argument.
marginalmeans(mod, variables = c("am", "vs"), interaction = TRUE)
estimate_means(mod, at = c("am", "vs"))
Now, replications of your examples:
library(modelbased)
library(marginaleffects)
model <- lm(Petal.Length ~ Sepal.Width * Species, data = iris)
estimate_means(
model,
fixed = "Sepal.Width")
predictions(
model,
newdata = datagrid(Species = unique))
estimate_means(
model,
at = c("Species", "Sepal.Width"), length = 2)
predictions(
model,
newdata = datagrid(Species = unique, Sepal.Width = c(2, 4.4)))
estimate_means(
model,
at = "Species=c('versicolor', 'setosa')")
predictions(
model,
newdata = datagrid(Species = c("versicolor", "setosa")))
estimate_means(
model,
at = "Sepal.Width",
length = 5)
predictions(
model,
newdata = datagrid(Sepal.Width = fivenum, Species = unique),
by = "Sepal.Width")
estimate_means(
model,
at = "Sepal.Width=c(2, 4)")
predictions(
model,
newdata = datagrid(Sepal.Width = c(2, 4), Species = unique),
by = "Sepal.Width")
Of course, you can use the easystats
version of the datagrid()
builder and feed that to the newdata
argument in predictions()
. This would bring the two approaches closer in syntax.
I have some experience now in wrapping marginaleffects functions to get the desired results, so maybe we can try to continue on this issue. The major distinctions/use cases I see (but feel free to correct/add):
predict()
- i.e. holding non-focal terms constant at mean/reference level/modeemmeans()
- i.e. mean and "weighting" factors for non-focal termsFurthermore, for mixed models:
Once we have these different estimate_*()
functions/options, we can easily get the contrasts or pairwise comparisons by providing the hypothesis
argument to those function calls.
WDYT?
Okay I made a few attempts, but the issue is I cannot seem to be able to pass a datagrid to
marginaleffects::marginalmeans()
. Which blocks me from doing things like:Created on 2022-09-26 by the reprex package (v2.0.1)
Basically controlling at which levels of Sepal.Width I want the means. Is that the right way of doing it or should one directly use predictions on the datagrid?
@bwiernik is there a reason do use marginalmeans() over simply get_predicted on the datagrid one wants the means of? (@vincentarelbundock but I don't wanna tag you each time I need help with some probably basic thing ^^)