briandk / granovaGG

Bob Pruzek and Jim Helmreich's implementation of Elemental Graphics for Analysis of Variance
Other
15 stars 4 forks source link

granovagg.1w should let users suppress printing of visual squares #128

Closed briandk closed 13 years ago

briandk commented 13 years ago

I've resurrected the dosqrs argument from granova.1w. Users can suppress squares by adding doqsrs = FALSE to a granovagg.1w call. But, dosqrs still defaults to TRUE.

As an example:

# Squares will appear
> data(poison)
> granovagg.1w(poison$SurvTime, group = poison$Group, ylab = "Survival Time", dosqrs = TRUE)

# Squares will still appear, since dosqrs defaults to TRUE
> granovagg.1w(poison$SurvTime, group = poison$Group, ylab = "Survival Time")

# Squares will be suppressed
> granovagg.1w(poison$SurvTime, group = poison$Group, ylab = "Survival Time", dosqrs = FALSE)

The graphics below illustrate what it now looks like if a user suppresses the squares:

Visual comparison of suppressed and unsuppressed squares

Visual comparison of legends when squares are suppressed/unsuppressed

Visual comparison of legends

rmpruzek commented 13 years ago

If there are only two groups, then most users (and I) would prefer that the printed statistic be t, not F. Of course this means for that 2 gp case that the positive square root of F be used as t; so the correct label is probably | t-statistic | , to show that the sign is not considered. Otherwise, I like your changes. bp

briandk commented 13 years ago

@rmpruzek, I'm not getting agreement that the positive square root of the F-statistic is the t-statistic. Obviously it should be, but my code isn't showing that:

> library(granova)
> data(anorexia.sub)
> astack <- stack(anorexia.sub)
> lm1 <- lm(values ~ ind, data = astack)
> summary(lm1)

Call:
lm(formula = values ~ ind, data = astack)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.294  -2.454   1.106   4.004  11.106 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   90.494      1.689  53.578  < 2e-16 ***
indPrewt      -7.265      2.389  -3.041  0.00467 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 6.964 on 32 degrees of freedom
Multiple R-squared: 0.2242, Adjusted R-squared:   0.2 
F-statistic:  9.25 on 1 and 32 DF,  p-value: 0.004673 

> t.test(anorexia.sub[, 1], anorexia.sub[, 2], paired = TRUE)

    Paired t-test

data:  anorexia.sub[, 1] and anorexia.sub[, 2] 
t = -4.1849, df = 16, p-value = 0.0007003
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -10.94471  -3.58470 
sample estimates:
mean of the differences 
              -7.264706 

/cc @Wildoane

rmpruzek commented 13 years ago

Brian,        In the t.test the paired=FALSE must be set. Try it, you should get agreement then. bob

rmpruzek commented 13 years ago

Brian,       Because your heading cited .1w, I answered w/ that in mind. But it is not clear to me why you use lm(   ) AND t.test w/ paired = TRUE; there's the rub. bob

briandk commented 13 years ago

Bob,

The error was mine. I got confused because we use a paired t-test in granovagg.ds (line #138) and in granova.ds (line #110). I forgot that the reason we use paired for .ds is that it's fundamentally a dependent sample analysis.

The reason I used t.test() and lm() above was to verify that F == t^2. You're absolutely right: my error was in running the .1w t-test as a paired t-test. After re-running the code on a non-paired t-test, I've verified the expected behavior:

> data(anorexia.sub)
> astack <- stack(anorexia.sub)
> summary(lm(values ~ ind, data = astack)) # Note F = 9.25

Call:
lm(formula = values ~ ind, data = astack)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.294  -2.454   1.106   4.004  11.106 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   90.494      1.689  53.578  < 2e-16 ***
indPrewt      -7.265      2.389  -3.041  0.00467 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 6.964 on 32 degrees of freedom
Multiple R-squared: 0.2242, Adjusted R-squared:   0.2 
F-statistic:  9.25 on 1 and 32 DF,  p-value: 0.004673

> t.test(anorexia.sub[, 1], anorexia.sub[, 2])

    Welch Two Sample t-test

data:  anorexia.sub[, 1] and anorexia.sub[, 2] 
t = -3.0414, df = 25.986, p-value = 0.005324
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -12.17472  -2.35469 
sample estimates:
mean of x mean of y 
 83.22941  90.49412 

> (-3.0414)^2 # t-squared approximately equals F = 9.25
[1] 9.250114

Does that look right to you now?

rmpruzek commented 13 years ago

Brian,      Yes, it is spot-on correct. Good to see you wrapping this up. Thanks again, bob

briandk commented 13 years ago

@rmpruzek - the latest changes I just pushed should show a properly computed t-statistic in the two-group case. Below is example code from a two-group case (the anorexia.sub data) and the visual output confirming that we display the proper t-statistic:

> data(anorexia.sub)
> astack <- stack(anorexia.sub) # stacking the data so we can a grouping column
> granovagg.1w(astack$values, group = astack$ind)

By-group summary statistics for your input data (ordered by group means)
   group group.mean trimmed.mean contrast variance standard.deviation group.size
2  Prewt      83.23        83.24    -3.63    25.17               5.02         17
1 Postwt      90.49        91.80     3.63    71.83               8.48         17

Below is a linear model summary of your input data

Call:
lm(formula = score ~ group, data = owp$data)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.294  -2.454   1.106   4.004  11.106 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   90.494      1.689  53.578  < 2e-16 ***
groupPrewt    -7.265      2.389  -3.041  0.00467 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 6.964 on 32 degrees of freedom
Multiple R-squared: 0.2242, Adjusted R-squared:   0.2 
F-statistic:  9.25 on 1 and 32 DF,  p-value: 0.004673 

> with(anorexia.sub, t.test(Prewt, Postwt))

    Welch Two Sample t-test

data:  Prewt and Postwt 
t = -3.0414, df = 25.986, p-value = 0.005324
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -12.17472  -2.35469 
sample estimates:
mean of x mean of y 
 83.22941  90.49412 

And here's a comparison of what it looks like when dosqrs goes from TRUE to FALSE

comparison of visual output for two-group granovagg.1w case

briandk commented 13 years ago

I have two more questions on this issue:

  1. Can we change the name of the dosqrs argument to print.squares, which is more informative and easier to type?
  2. For the two-group case, should we output a t-test summary instead of the current model summary?

cc: @rmpruzek, @Wildoane

rmpruzek commented 13 years ago

Brian, Both are good ideas. Yes to each. Thanks, bob


From: Brian A. Danielak reply@reply.github.com To: rmpruzek rmpruzek@yahoo.com Sent: Thursday, October 6, 2011 5:39 PM Subject: Re: [granovaGG] granovagg.1w should let users suppress printing of visual squares (#128)

I have two more questions on this issue:

  1. Can we change the name of the dosqrs argument to print.squares, which is more informative and easier to type?
  2. For the two-group case, should we output a t-test summary instead of the current model summary?

cc: @rmpruzek, @Wildoane

Reply to this email directly or view it on GitHub: https://github.com/briandk/granovaGG/pull/128#issuecomment-2315375

rmpruzek commented 13 years ago

Brian, Again, good to see. This is helpful. bob

WilDoane commented 13 years ago

Looks good to me, too.

briandk commented 13 years ago

I changed dosqrs to print.squares. I also fixed the printed output for the two-group case. Now, when there are two groups you see t-test output; when there are more than two groups you get a linear model summary.

> granovagg.1w(anorexia.sub)

By-group summary statistics for your input data (ordered by group means)
   group group.mean trimmed.mean contrast variance standard.deviation group.size
1  Prewt      83.23        83.24    -3.63    25.17               5.02         17
2 Postwt      90.49        91.80     3.63    71.83               8.48         17

Below is a t-test summary of your input data

    Welch Two Sample t-test

data:  unstacked.data[, 1] and unstacked.data[, 2] 
t = -3.0414, df = 25.986, p-value = 0.005324
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -12.17472  -2.35469 
sample estimates:
mean of x mean of y 
 83.22941  90.49412 

> with(mpg, granovagg.1w(hwy, group = manufacturer)) -> p

By-group summary statistics for your input data (ordered by group means)
        group group.mean trimmed.mean contrast variance standard.deviation group.size
8  land rover      16.50        16.50    -6.94     3.00               1.73          4
9     lincoln      17.00        17.00    -6.44     1.00               1.00          3
7        jeep      17.62        17.83    -5.82    10.55               3.25          8
3       dodge      17.95        17.70    -5.49    12.77               3.57         37
10    mercury      18.00        18.00    -5.44     1.33               1.15          4
4        ford      19.36        18.60    -4.08    11.07               3.33         25
2   chevrolet      21.89        22.00    -1.55    26.10               5.11         19
11     nissan      24.62        24.78     1.18    25.92               5.09         13
14     toyota      24.91        24.68     1.47    38.02               6.17         34
13     subaru      25.57        25.70     2.13     1.34               1.16         14
12    pontiac      26.40        26.33     2.96     1.30               1.14          5
1        audi      26.44        26.17     3.00     4.73               2.18         18
6     hyundai      26.86        26.70     3.42     4.75               2.18         14
15 volkswagen      29.22        28.18     5.78    28.26               5.32         27
5       honda      32.56        32.57     9.12     6.53               2.55          9

The following groups are likely to be overplotted
     group group.mean contrast
7     jeep      17.62    -5.82
3    dodge      17.95    -5.49
10 mercury      18.00    -5.44
11  nissan      24.62     1.18
14  toyota      24.91     1.47
12 pontiac      26.40     2.96
1     audi      26.44     3.00

Below is a linear model summary of your input data

Call:
lm(formula = score ~ group, data = owp$data)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.9118 -2.3600 -0.2911  2.0882 14.7778 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     26.44444    0.98206  26.928  < 2e-16 ***
groupchevrolet  -4.54971    1.37044  -3.320 0.001055 ** 
groupdodge      -8.49850    1.19734  -7.098 1.74e-11 ***
groupford       -7.08444    1.28796  -5.501 1.05e-07 ***
grouphonda       6.11111    1.70097   3.593 0.000404 ***
grouphyundai     0.41270    1.48473   0.278 0.781304    
groupjeep       -8.81944    1.77043  -4.982 1.28e-06 ***
groupland rover -9.94444    2.30313  -4.318 2.39e-05 ***
grouplincoln    -9.44444    2.59828  -3.635 0.000347 ***
groupmercury    -8.44444    2.30313  -3.667 0.000309 ***
groupnissan     -1.82906    1.51651  -1.206 0.229082    
grouppontiac    -0.04444    2.10628  -0.021 0.983184    
groupsubaru     -0.87302    1.48473  -0.588 0.557141    
grouptoyota     -1.53268    1.21450  -1.262 0.208298    
groupvolkswagen  2.77778    1.26783   2.191 0.029509 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 4.167 on 219 degrees of freedom
Multiple R-squared: 0.5398, Adjusted R-squared: 0.5104 
F-statistic: 18.35 on 14 and 219 DF,  p-value: < 2.2e-16 
rmpruzek commented 13 years ago

Brian,       I'd totally forgotten this point, but your thoroughness has ensured that I remember. The var.equal arg for the t.test  function should be TRUE in gg.1w, not FALSE, which is the default. (FYI, I think this was a screw up on the Core Team's part to default FALSE, leading to  the Welch test; many reasons can be sited.)  But certainly for our function here we do NOT want Welch. BTW, I trust that T sufficient for TRUE, F for FALSE.         Two other small matters, but these qs. comes from my CRAN 1.0 version of gg.1w, not your github update:

  1.  Under the CoercetoMatrix line pertaining to a user error (when multiple columns have been read in, of equal length), and also a group vector. The comment ends w/ If your data contains columns of equal numbers of observations, try re-calling granova.1w \n    on your data while setting group = NULL" Why the reference to granova.1w? I'd expect granovagg.1w.
  2. I just ran the mpg data and set the resid arg to TRUE, but the graphic does not show a residual plot on the right. I trust  the new version fixes this. ??       Thanks for everything, bob
rmpruzek commented 13 years ago

Brian,        Is there one particular way you'd recommend to acquire your most recent versions of the functions gg.1w, gg.ds, and gg.contr from github? This may be as plain as the nose on my face, but I'm not certain if the way I'm going is the preferred way. Thanks, bob

briandk commented 13 years ago

@rmpruzek - as has always been the case, the latest development version of granovaGG can be obtained by following the install_github directions on the granovaGG main page:

    library(devtools)

    # If you've already installed granovaGG, then detach and remove the current local version
    detach("package:granovaGG")
    remove.packages("granovaGG", lib = .libPaths())

    # ...and then install the latest development version
    install_github(repo="granovaGG", username="briandk", branch="dev")
rmpruzek commented 13 years ago

Brian, I've identified what appears to be a problem for granovagg.ds; it concerns column names. See output: str(granovagg.ds) function (data = NULL, revc = FALSE, main = "default_granova_title", xlab = NULL, ylab = NULL, conf.level = 0.95, plot.theme = "theme_granova_ds", ...)


From: Brian A. Danielak reply@reply.github.com To: rmpruzek rmpruzek@yahoo.com Sent: Saturday, October 8, 2011 6:59 PM Subject: Re: [granovaGG] granovagg.1w should let users suppress printing of visual squares (#128)

@rmpruzek - as has always been the case, the latest development version of granovaGG can be obtained by following the install_github directions on the granovaGG main page:

  library(devtools)

  # If you've already installed granovaGG, then detach and remove the current local version
  detach("package:granovaGG")
  remove.packages("granovaGG", lib = .libPaths())

  # ...and then install the latest development version
  install_github(repo="granovaGG", username="briandk", branch="dev")

Reply to this email directly or view it on GitHub: https://github.com/briandk/granovaGG/pull/128#issuecomment-2334276

rmpruzek commented 13 years ago

Brian, I worked w/ the latest functions I found, what will be no surprise to you, that some of the new edits we've agreed upon have not yet been implemented in the code. The one thing that caused me some consternation concerned changing of the main title and the name for the response. I got these to work after some study, and trials, like this: =====>

rat.1wgg=granovagg.1w(rat[,1],jj=.6,resid=T,group=rep(1:6,ea=10),main="One-way ANOVA display showing weight-gains for six groups of rats",ylab="Weight gains of rats in grams")

Below are by-group summary statistics of your input data -- these are ok, so I exclude numerics here... group group.mean trimmed.mean contrast variance standard.deviation group.size

rat.1wgg+title(main="",ylab="") =====> The resulting graphic is fine w/ this approach. Is this what you would recommend? BTW, I see the documentation for jj is rather opaque. I understand the idea of defaulting jj = NULL, but then we need to provide the user w/ the idea that tacitly, jj = 1 is the default, that values half this size can virtually eliminate jittering, but that larger values such as jj = 1.5 or 2 (or more) might be desirable for some data sets. Also, FYI, I found that if I switched from your current (small green dot) from 3/2 to 7.5/2 that the size of the grand mean dot in the middle (green) is just about right by my eye. I could add another thing or two, but I'm not trying to be comprehensive here. I have all I need for SMEP now, and will post the slide show I've done in our jointly held dropbox directory this evening. Thanks again, bob


From: Brian A. Danielak reply@reply.github.com To: rmpruzek rmpruzek@yahoo.com Sent: Saturday, October 8, 2011 6:59 PM Subject: Re: [granovaGG] granovagg.1w should let users suppress printing of visual squares (#128)

@rmpruzek - as has always been the case, the latest development version of granovaGG can be obtained by following the install_github directions on the granovaGG main page:

  library(devtools)

  # If you've already installed granovaGG, then detach and remove the current local version
  detach("package:granovaGG")
  remove.packages("granovaGG", lib = .libPaths())

  # ...and then install the latest development version
  install_github(repo="granovaGG", username="briandk", branch="dev")

Reply to this email directly or view it on GitHub: https://github.com/briandk/granovaGG/pull/128#issuecomment-2334276

briandk commented 13 years ago

@rmpruzek - I've put in several major updates to try and deal with the issues you raise. Since this particular issue ("letting users suppress the printing of visual squares") has been resolved, I would encourage you offer your comments toward the currently open issues here:

https://github.com/briandk/granovaGG/issues

Several things to note:

  1. I changed the code to enlarge the grand mean dot in granova.1w on October 1. I'm surprised the version you were using did not reflect the change in size from 3/2 to 2.5. You can view that commit here: https://github.com/briandk/granovaGG/commit/2e06ad951bdf0b3ce2a0101b3991e04aee79a24b
  2. There is an ongoing discussion about how to deal with handing NULL in as a parameter value for graph labels. I would very much value your input on that discussion. If you'd like to comment, please do so using the comment box at the bottom of the discussion thread here: https://github.com/briandk/granovaGG/pull/133

Also, I would strongly urge you to:

  1. Install the latest development version of granovaGG by following these directions

    library(devtools)
    
    # If you've already installed granovaGG, then detach and remove the current local version
    detach("package:granovaGG")
    remove.packages("granovaGG", lib = .libPaths())
    
    # ...and then install the latest development version
    install_github(repo="granovaGG", username="briandk", branch="dev")
  2. Include the output from sessionInfo() when posting code to reproduce an issue. When you do, it helps @Wildoane and I track exactly which version of the code you were using. Here's an example of posting sample code, with real output from the sessionInfo() command at the time I was running some code

    data(anorexia.sub)
    granovagg.ds(anorexia.sub)
    ## Note the expected behavior, actual behavior, and problems here
    ## ...
    ## Then include the output from sessionInfo()
    > sessionInfo()
    R version 2.13.2 (2011-09-30)
    Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
    
    locale:
    [1] C/en_US.UTF-8/C/C/C/C
    
    attached base packages:
    [1] splines   grid      stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
    [1] granovaGG_1.0.20111009210450 granova_2.0                  car_2.0-11                  
    [4] survival_2.36-9              nnet_7.3-1                   MASS_7.3-14                 
    [7] gridExtra_0.8                RColorBrewer_1.0-5           ggplot2_0.8.9               
    [10] proto_0.3-9.2                reshape_0.8.4                plyr_1.6                    
    [13] devtools_0.4                
    
    loaded via a namespace (and not attached):
    [1] RCurl_1.6-10 digest_0.5.1 tools_2.13.2