briandk / granovaGG

Bob Pruzek and Jim Helmreich's implementation of Elemental Graphics for Analysis of Variance
Other
15 stars 4 forks source link

granovagg.ds should provide printed output #127

Closed briandk closed 13 years ago

briandk commented 13 years ago

I've now added printed output for granovagg.ds that closely matches that of granova.ds. Compare:

# Classic granova
> granova.ds(anorexia.sub)
            Summary Stats
n                  17.000
mean(x)            83.229
mean(y)            90.494
mean(D=x-y)        -7.265
SD(D)               7.157
ES(D)              -1.015
r(x,y)              0.538
r(x+y,d)           -0.546
LL 95%CI          -10.945
UL 95%CI           -3.585
t(D-bar)           -4.185
df.t               16.000
pval.t              0.001
# granovaGG version
> granovagg.ds(anorexia.sub, conf.level = 0.99) -> p
                              Summary Statistics
n                                         17.000
Prewt mean                                83.229
Postwt mean                               90.494
mean(D = Prewt - Postwt)                  -7.265
SD(D)                                      7.157
Effect Size                               -1.015
r(Prewt, Postwt)                           0.538
r(Prewt + Postwt, D)                      -0.546
Lower 99% CI Treatment Effect            -12.335
Upper 99% CI Treatment Effect             -2.194
t (D-bar)                                 -4.185
df.t                                      16.000
p-value (t-test)                           0.001
rmpruzek commented 13 years ago

I'm assuming that the 'Prewt' and 'Postwt' terms are names for the input columns; if so, then all these labels for most statistics are fine -- on the presumption that if the input n x 2 matrix (or data.frame?) does not have named columns that the defaults X and Y would be used (for horizontal and vertical axes respectively). The labels for the confidence intervals should, however, be generic: No reference to 'treatment effect'; rather, just lower and upper limits for 'cc %' C.I. (In paradigm 1a, for example, there is no 'treatment', hence no treatment effect.) Finally, your p-value (t-test) would be improved if (t-statistic) were used. Thank you! bp

briandk commented 13 years ago

I've now added some new functionality in response to @rmpruzek's comments. Bob, you're right: in the data above prewt and postwt are pulled from the input data passed in. The idea was to have the printed output dynamically reflect the column names, which it does.

I also just pushed a change (482de62) that ensures even if users pass in data without column names, default column names (in the form of x for data plotted on the x-axis, and likewise for y) will be supplied:

## Standard output when columns are named
> data(anorexia.sub)
> granovagg.ds(anorexia.sub)
                              Summary Statistics
n                                         17.000
Prewt mean                                83.229
Postwt mean                               90.494
mean(D = Prewt - Postwt)                  -7.265
SD(D)                                      7.157
Effect Size                               -1.015
r(Prewt, Postwt)                           0.538
r(Prewt + Postwt, D)                      -0.546
Lower 95% CI Treatment Effect            -10.945
Upper 95% CI Treatment Effect             -3.585
t (D-bar)                                 -4.185
df.t                                      16.000
p-value (t-test)                           0.001

## Nulling out the column names
> colnames(anorexia.sub) <- NULL
> granovagg.ds(anorexia.sub)
                              Summary Statistics
n                                         17.000
x mean                                    83.229
y mean                                    90.494
mean(D = x - y)                           -7.265
SD(D)                                      7.157
Effect Size                               -1.015
r(x, y)                                    0.538
r(x + y, D)                               -0.546
Lower 95% CI Treatment Effect            -10.945
Upper 95% CI Treatment Effect             -3.585
t (D-bar)                                 -4.185
df.t                                      16.000
p-value (t-test)                           0.001

## Reversing X and Y when columns aren't named
> granovagg.ds(anorexia.sub, revc = TRUE)
                              Summary Statistics
n                                         17.000
x mean                                    90.494
y mean                                    83.229
mean(D = x - y)                            7.265
SD(D)                                      7.157
Effect Size                                1.015
r(x, y)                                    0.538
r(x + y, D)                                0.546
Lower 95% CI Treatment Effect              3.585
Upper 95% CI Treatment Effect             10.945
t (D-bar)                                  4.185
df.t                                      16.000
p-value (t-test)                           0.001
Warning messages:
1: In is.na(cols) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(cols) : is.na() applied to non-(list or vector) of type 'NULL'

Here's a side-by-side comparison of the visual output:

Side-by-side comparison of named column output and un-named column output

briandk commented 13 years ago

I also just fixed the p-value printed output and the Confidence Interval printed output.

> granovagg.ds(anorexia.sub)
                              Summary Statistics
n                                         17.000
Prewt mean                                83.229
Postwt mean                               90.494
mean(D = Prewt - Postwt)                  -7.265
SD(D)                                      7.157
Effect Size                               -1.015
r(Prewt, Postwt)                           0.538
r(Prewt + Postwt, D)                      -0.546
Lower 95% Confidence Interval            -10.945
Upper 95% Confidence Interval             -3.585
t (D-bar)                                 -4.185
df.t                                      16.000
p-value (t-statistic)                      0.001
rmpruzek commented 13 years ago

Brian, This is very good... again, THANKS, bob


From: Brian A. Danielak reply@reply.github.com To: rmpruzek rmpruzek@yahoo.com Sent: Tuesday, October 4, 2011 12:13 AM Subject: Re: [granovaGG] granovagg.ds should provide printed output (#127)

I've now added some new functionality in response to @rmpruzek's comments. Bob, you're right: in the data above prewt and postwt are pulled from the input data passed in. The idea was to have the printed output dynamically reflect the column names, which it does.

I also just pushed a change (482de62) that ensures even if users pass in data without column names, default column names (in the form of x for data plotted on the x-axis, and likewise for y) will be supplied:

## Standard output when columns are named
> data(anorexia.sub)
> granovagg.ds(anorexia.sub)
               Summary Statistics
n                     17.000
Prewt mean                83.229
Postwt mean                90.494
mean(D = Prewt - Postwt)         -7.265
SD(D)                   7.157
Effect Size                -1.015
r(Prewt, Postwt)              0.538
r(Prewt + Postwt, D)           -0.546
Lower 95% CI Treatment Effect      -10.945
Upper 95% CI Treatment Effect       -3.585
t (D-bar)                 -4.185
df.t                   16.000
p-value (t-test)              0.001

## Nulling out the column names
> colnames(anorexia.sub) <- NULL
> granovagg.ds(anorexia.sub)
               Summary Statistics
n                     17.000
x mean                  83.229
y mean                  90.494
mean(D = x - y)              -7.265
SD(D)                   7.157
Effect Size                -1.015
r(x, y)                  0.538
r(x + y, D)                -0.546
Lower 95% CI Treatment Effect      -10.945
Upper 95% CI Treatment Effect       -3.585
t (D-bar)                 -4.185
df.t                   16.000
p-value (t-test)              0.001

## Reversing X and Y when columns aren't named
> granovagg.ds(anorexia.sub, revc = TRUE)
               Summary Statistics
n                     17.000
x mean                  90.494
y mean                  83.229
mean(D = x - y)              7.265
SD(D)                   7.157
Effect Size                1.015
r(x, y)                  0.538
r(x + y, D)                0.546
Lower 95% CI Treatment Effect       3.585
Upper 95% CI Treatment Effect       10.945
t (D-bar)                 4.185
df.t                   16.000
p-value (t-test)              0.001
Warning messages:
1: In is.na(cols) : is.na() applied to non-(list or vector) of type 'NULL'
2: In is.na(cols) : is.na() applied to non-(list or vector) of type 'NULL'

Here's a side-by-side comparison of the visual output:

Side-by-side comparison of named column output and un-named column output

Reply to this email directly or view it on GitHub: https://github.com/briandk/granovaGG/pull/127#issuecomment-2282332

briandk commented 13 years ago

@Wildoane - if you can give me a quick +1 indicating you're also fine with the changes, I'll go ahead and merge this into dev.

WilDoane commented 13 years ago

I note that the X, Y values are reversed in the two cases... note their respective means and therefore the sign on other stats. I know you're playing with column reversals... I'm just not clear of the context. Regardless, the magnitude appears to be correct.

WilDoane commented 13 years ago

+1 merge away.

briandk commented 13 years ago

@Wildoane - I think the intent is that "x" and "y" will only appear if no explicit column names are present when data are passed in. In such cases, the only sensible way to identify data is to refer to what is getting displayed on the x and y axes. Consequently, reversing the axes SHOULD reverse the numbers and sign, since in a reversal you're actually swapping which data appears on which axis. You can verify that the behavior is consistent with classic granova, if you'd like. It should be.