WillemSleegers / tidystats-v0.3

R package to produce a tidy output file of statistical models.
Other
22 stars 2 forks source link

What to name terms in a repeated measures ANOVA? #36

Closed WillemSleegers closed 6 years ago

WillemSleegers commented 7 years ago

Currently tidy_stats.aovlist() returns terms with the subject term prepended to the variable names.

For example:

# A tibble: 11 x 4
                 term statistic       value                          method
                <chr>     <chr>       <dbl>                           <chr>
 1       ID-Residuals        df  9.00000000 One-way repeated measures ANOVA
 2       ID-Residuals        SS 58.07800000 One-way repeated measures ANOVA
 3       ID-Residuals        MS  6.45311111 One-way repeated measures ANOVA
 4     ID:group-group        df  1.00000000 One-way repeated measures ANOVA
 5     ID:group-group        SS 12.48200000 One-way repeated measures ANOVA
 6     ID:group-group        MS 12.48200000 One-way repeated measures ANOVA
 7     ID:group-group         F 16.50088132 One-way repeated measures ANOVA
 8     ID:group-group         p  0.00283289 One-way repeated measures ANOVA
 9 ID:group-Residuals        df  9.00000000 One-way repeated measures ANOVA
10 ID:group-Residuals        SS  6.80800000 One-way repeated measures ANOVA
11 ID:group-Residuals        MS  0.75644444 One-way repeated measures ANOVA

Not the most pretty naming scheme. What would be better?

WillemSleegers commented 7 years ago

Same question for linear mixed models. Example of the sleep data:

                 term statistic value             method  type
                <chr>     <chr> <dbl>              <chr> <chr>
 1 ID-(Intercept)-(R)  variance  2.85 Linear mixed model other
 2 ID-(Intercept)-(R)        SD  1.69 Linear mixed model other
 3       Residual-(R)  variance  0.76 Linear mixed model other
 4       Residual-(R)        SD  0.87 Linear mixed model other
 5    (Intercept)-(F)  estimate  0.75 Linear mixed model other
 6    (Intercept)-(F)        SE  0.60 Linear mixed model other
 7    (Intercept)-(F)         t  1.25 Linear mixed model other
 8         group2-(F)  estimate  1.58 Linear mixed model other
 9         group2-(F)        SE  0.39 Linear mixed model other
10         group2-(F)         t  4.06 Linear mixed model other
WillemSleegers commented 7 years ago

Perhaps the following would work. We don't report the between subject effects when there isn't a between subjects factor; and we simply use the names of the variables (in this case group). As for the residuals, we add that to each variable, with a '-' separator. The example in the first post would look like this:

# A tibble: 11 x 4
    term             statistic   value       method
    <chr>            <chr>       <dbl>       <chr>
 1  group            df          1.00000000  One-way repeated measures ANOVA
 2  group            SS          12.48200000 One-way repeated measures ANOVA
 3  group            MS          12.48200000 One-way repeated measures ANOVA
 4  group            F           16.50088132 One-way repeated measures ANOVA
 5  group            p           0.00283289  One-way repeated measures ANOVA
 6  group-Residual  df          9.00000000  One-way repeated measures ANOVA
 7  group-Residual  SS          6.80800000  One-way repeated measures ANOVA
 8  group-Residual  MS          0.75644444  One-way repeated measures ANOVA

What's missing is whether the variable is a within subjects variable or not. Not sure yet whether we should include that information.

WillemSleegers commented 7 years ago

And the mixed models output might not be that bad. Although I did move the '(F)' and '(R)' to the start of the term, for example:

                       term statistic       value             method       type
                      <chr>     <chr>       <dbl>              <chr>      <chr>
 1 (R)-scenario-(Intercept)  variance  216.771753 Linear mixed model hypothesis
 2 (R)-scenario-(Intercept)        SD   14.723171 Linear mixed model hypothesis
 3  (R)-subject-(Intercept)  variance 3367.734394 Linear mixed model hypothesis
 4  (R)-subject-(Intercept)        SD   58.032184 Linear mixed model hypothesis
 5             (R)-Residual  variance  637.046569 Linear mixed model hypothesis
 6             (R)-Residual        SD   25.239781 Linear mixed model hypothesis
 7          (F)-(Intercept)  estimate  202.588095 Linear mixed model hypothesis
 8          (F)-(Intercept)        SE   24.645978 Linear mixed model hypothesis
 9          (F)-(Intercept)         t    8.219925 Linear mixed model hypothesis
10          (F)-attitudepol  estimate  -19.692160 Linear mixed model hypothesis
11          (F)-attitudepol        SE    5.545759 Linear mixed model hypothesis
12          (F)-attitudepol         t   -3.550851 Linear mixed model hypothesis
WillemSleegers commented 7 years ago

But I'm doubting about whether we need to add whether it's random or fixed. We could also add a 'notes' column that contains that information. The output of some of the stats tests already have a notes column.

WillemSleegers commented 6 years ago

Or, I suppose we add a column called 'group' to indicate whether the term belongs to the fixed category or random category.

Since I also added support for descriptives, which has a group column, this may not be a terrible idea.

WillemSleegers commented 6 years ago

Since we added a group column, this is now solved.