granovagg.1w should let users suppress printing of visual squares

briandk commented 13 years ago

I've resurrected the dosqrs argument from granova.1w. Users can suppress squares by adding doqsrs = FALSE to a granovagg.1w call. But, dosqrs still defaults to TRUE.

As an example:

# Squares will appear
> data(poison)
> granovagg.1w(poison$SurvTime, group = poison$Group, ylab = "Survival Time", dosqrs = TRUE)

# Squares will still appear, since dosqrs defaults to TRUE
> granovagg.1w(poison$SurvTime, group = poison$Group, ylab = "Survival Time")

# Squares will be suppressed
> granovagg.1w(poison$SurvTime, group = poison$Group, ylab = "Survival Time", dosqrs = FALSE)

The graphics below illustrate what it now looks like if a user suppresses the squares:

No graphical squares print
The F-statistic text drops lower
The legend appears slightly different
Visual comparison of suppressed/unsuppressed squares

Visual comparison of legends when squares are suppressed/unsuppressed

rmpruzek commented 13 years ago

If there are only two groups, then most users (and I) would prefer that the printed statistic be t, not F. Of course this means for that 2 gp case that the positive square root of F be used as t; so the correct label is probably | t-statistic | , to show that the sign is not considered. Otherwise, I like your changes. bp

briandk commented 13 years ago

@rmpruzek, I'm not getting agreement that the positive square root of the F-statistic is the t-statistic. Obviously it should be, but my code isn't showing that:

> library(granova)
> data(anorexia.sub)
> astack <- stack(anorexia.sub)
> lm1 <- lm(values ~ ind, data = astack)
> summary(lm1)

Call:
lm(formula = values ~ ind, data = astack)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.294  -2.454   1.106   4.004  11.106 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   90.494      1.689  53.578  < 2e-16 ***
indPrewt      -7.265      2.389  -3.041  0.00467 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 6.964 on 32 degrees of freedom
Multiple R-squared: 0.2242, Adjusted R-squared:   0.2 
F-statistic:  9.25 on 1 and 32 DF,  p-value: 0.004673 

> t.test(anorexia.sub[, 1], anorexia.sub[, 2], paired = TRUE)

    Paired t-test

data:  anorexia.sub[, 1] and anorexia.sub[, 2] 
t = -4.1849, df = 16, p-value = 0.0007003
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -10.94471  -3.58470 
sample estimates:
mean of the differences 
              -7.264706

/cc @Wildoane

rmpruzek commented 13 years ago

Brian, In the t.test the paired=FALSE must be set. Try it, you should get agreement then. bob

rmpruzek commented 13 years ago

Brian, Because your heading cited .1w, I answered w/ that in mind. But it is not clear to me why you use lm( ) AND t.test w/ paired = TRUE; there's the rub. bob

briandk commented 13 years ago

Bob,

The error was mine. I got confused because we use a paired t-test in granovagg.ds (line #138) and in granova.ds (line #110). I forgot that the reason we use paired for .ds is that it's fundamentally a dependent sample analysis.

The reason I used t.test() and lm() above was to verify that F == t^2. You're absolutely right: my error was in running the .1w t-test as a paired t-test. After re-running the code on a non-paired t-test, I've verified the expected behavior:

> data(anorexia.sub)
> astack <- stack(anorexia.sub)
> summary(lm(values ~ ind, data = astack)) # Note F = 9.25

Call:
lm(formula = values ~ ind, data = astack)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.294  -2.454   1.106   4.004  11.106 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   90.494      1.689  53.578  < 2e-16 ***
indPrewt      -7.265      2.389  -3.041  0.00467 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 6.964 on 32 degrees of freedom
Multiple R-squared: 0.2242, Adjusted R-squared:   0.2 
F-statistic:  9.25 on 1 and 32 DF,  p-value: 0.004673

> t.test(anorexia.sub[, 1], anorexia.sub[, 2])

    Welch Two Sample t-test

data:  anorexia.sub[, 1] and anorexia.sub[, 2] 
t = -3.0414, df = 25.986, p-value = 0.005324
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -12.17472  -2.35469 
sample estimates:
mean of x mean of y 
 83.22941  90.49412 

> (-3.0414)^2 # t-squared approximately equals F = 9.25
[1] 9.250114

Does that look right to you now?

rmpruzek commented 13 years ago

Brian, Yes, it is spot-on correct. Good to see you wrapping this up. Thanks again, bob

briandk commented 13 years ago

@rmpruzek - the latest changes I just pushed should show a properly computed t-statistic in the two-group case. Below is example code from a two-group case (the anorexia.sub data) and the visual output confirming that we display the proper t-statistic:

> data(anorexia.sub)
> astack <- stack(anorexia.sub) # stacking the data so we can a grouping column
> granovagg.1w(astack$values, group = astack$ind)

By-group summary statistics for your input data (ordered by group means)
   group group.mean trimmed.mean contrast variance standard.deviation group.size
2  Prewt      83.23        83.24    -3.63    25.17               5.02         17
1 Postwt      90.49        91.80     3.63    71.83               8.48         17

Below is a linear model summary of your input data

Call:
lm(formula = score ~ group, data = owp$data)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.294  -2.454   1.106   4.004  11.106 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   90.494      1.689  53.578  < 2e-16 ***
groupPrewt    -7.265      2.389  -3.041  0.00467 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 6.964 on 32 degrees of freedom
Multiple R-squared: 0.2242, Adjusted R-squared:   0.2 
F-statistic:  9.25 on 1 and 32 DF,  p-value: 0.004673 

> with(anorexia.sub, t.test(Prewt, Postwt))

    Welch Two Sample t-test

data:  Prewt and Postwt 
t = -3.0414, df = 25.986, p-value = 0.005324
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -12.17472  -2.35469 
sample estimates:
mean of x mean of y 
 83.22941  90.49412

And here's a comparison of what it looks like when dosqrs goes from TRUE to FALSE

briandk commented 13 years ago

I have two more questions on this issue:

Can we change the name of the dosqrs argument to print.squares, which is more informative and easier to type?
For the two-group case, should we output a t-test summary instead of the current model summary?

cc: @rmpruzek, @Wildoane

rmpruzek commented 13 years ago

Brian, Both are good ideas. Yes to each. Thanks, bob

From: Brian A. Danielak reply@reply.github.com To: rmpruzek rmpruzek@yahoo.com Sent: Thursday, October 6, 2011 5:39 PM Subject: Re: [granovaGG] granovagg.1w should let users suppress printing of visual squares (#128)

I have two more questions on this issue:

Can we change the name of the dosqrs argument to print.squares, which is more informative and easier to type?
For the two-group case, should we output a t-test summary instead of the current model summary?

cc: @rmpruzek, @Wildoane

Reply to this email directly or view it on GitHub: https://github.com/briandk/granovaGG/pull/128#issuecomment-2315375

rmpruzek commented 13 years ago

Brian, Again, good to see. This is helpful. bob

WilDoane commented 13 years ago

Looks good to me, too.

briandk commented 13 years ago

I changed dosqrs to print.squares. I also fixed the printed output for the two-group case. Now, when there are two groups you see t-test output; when there are more than two groups you get a linear model summary.

> granovagg.1w(anorexia.sub)

By-group summary statistics for your input data (ordered by group means)
   group group.mean trimmed.mean contrast variance standard.deviation group.size
1  Prewt      83.23        83.24    -3.63    25.17               5.02         17
2 Postwt      90.49        91.80     3.63    71.83               8.48         17

Below is a t-test summary of your input data

    Welch Two Sample t-test

data:  unstacked.data[, 1] and unstacked.data[, 2] 
t = -3.0414, df = 25.986, p-value = 0.005324
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -12.17472  -2.35469 
sample estimates:
mean of x mean of y 
 83.22941  90.49412 

> with(mpg, granovagg.1w(hwy, group = manufacturer)) -> p

By-group summary statistics for your input data (ordered by group means)
        group group.mean trimmed.mean contrast variance standard.deviation group.size
8  land rover      16.50        16.50    -6.94     3.00               1.73          4
9     lincoln      17.00        17.00    -6.44     1.00               1.00          3
7        jeep      17.62        17.83    -5.82    10.55               3.25          8
3       dodge      17.95        17.70    -5.49    12.77               3.57         37
10    mercury      18.00        18.00    -5.44     1.33               1.15          4
4        ford      19.36        18.60    -4.08    11.07               3.33         25
2   chevrolet      21.89        22.00    -1.55    26.10               5.11         19
11     nissan      24.62        24.78     1.18    25.92               5.09         13
14     toyota      24.91        24.68     1.47    38.02               6.17         34
13     subaru      25.57        25.70     2.13     1.34               1.16         14
12    pontiac      26.40        26.33     2.96     1.30               1.14          5
1        audi      26.44        26.17     3.00     4.73               2.18         18
6     hyundai      26.86        26.70     3.42     4.75               2.18         14
15 volkswagen      29.22        28.18     5.78    28.26               5.32         27
5       honda      32.56        32.57     9.12     6.53               2.55          9

The following groups are likely to be overplotted
     group group.mean contrast
7     jeep      17.62    -5.82
3    dodge      17.95    -5.49
10 mercury      18.00    -5.44
11  nissan      24.62     1.18
14  toyota      24.91     1.47
12 pontiac      26.40     2.96
1     audi      26.44     3.00

Below is a linear model summary of your input data

Call:
lm(formula = score ~ group, data = owp$data)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.9118 -2.3600 -0.2911  2.0882 14.7778 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     26.44444    0.98206  26.928  < 2e-16 ***
groupchevrolet  -4.54971    1.37044  -3.320 0.001055 ** 
groupdodge      -8.49850    1.19734  -7.098 1.74e-11 ***
groupford       -7.08444    1.28796  -5.501 1.05e-07 ***
grouphonda       6.11111    1.70097   3.593 0.000404 ***
grouphyundai     0.41270    1.48473   0.278 0.781304    
groupjeep       -8.81944    1.77043  -4.982 1.28e-06 ***
groupland rover -9.94444    2.30313  -4.318 2.39e-05 ***
grouplincoln    -9.44444    2.59828  -3.635 0.000347 ***
groupmercury    -8.44444    2.30313  -3.667 0.000309 ***
groupnissan     -1.82906    1.51651  -1.206 0.229082    
grouppontiac    -0.04444    2.10628  -0.021 0.983184    
groupsubaru     -0.87302    1.48473  -0.588 0.557141    
grouptoyota     -1.53268    1.21450  -1.262 0.208298    
groupvolkswagen  2.77778    1.26783   2.191 0.029509 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 4.167 on 219 degrees of freedom
Multiple R-squared: 0.5398, Adjusted R-squared: 0.5104 
F-statistic: 18.35 on 14 and 219 DF,  p-value: < 2.2e-16

rmpruzek commented 13 years ago

Brian, I'd totally forgotten this point, but your thoroughness has ensured that I remember. The var.equal arg for the t.test function should be TRUE in gg.1w, not FALSE, which is the default. (FYI, I think this was a screw up on the Core Team's part to default FALSE, leading to the Welch test; many reasons can be sited.) But certainly for our function here we do NOT want Welch. BTW, I trust that T sufficient for TRUE, F for FALSE. Two other small matters, but these qs. comes from my CRAN 1.0 version of gg.1w, not your github update:

Under the CoercetoMatrix line pertaining to a user error (when multiple columns have been read in, of equal length), and also a group vector. The comment ends w/ If your data contains columns of equal numbers of observations, try re-calling granova.1w \n on your data while setting group = NULL" Why the reference to granova.1w? I'd expect granovagg.1w.
I just ran the mpg data and set the resid arg to TRUE, but the graphic does not show a residual plot on the right. I trust the new version fixes this. ?? Thanks for everything, bob

rmpruzek commented 13 years ago

Brian, Is there one particular way you'd recommend to acquire your most recent versions of the functions gg.1w, gg.ds, and gg.contr from github? This may be as plain as the nose on my face, but I'm not certain if the way I'm going is the preferred way. Thanks, bob

briandk commented 13 years ago

@rmpruzek - as has always been the case, the latest development version of granovaGG can be obtained by following the install_github directions on the granovaGG main page:

    library(devtools)

    # If you've already installed granovaGG, then detach and remove the current local version
    detach("package:granovaGG")
    remove.packages("granovaGG", lib = .libPaths())

    # ...and then install the latest development version
    install_github(repo="granovaGG", username="briandk", branch="dev")

rmpruzek commented 13 years ago

Brian, I've identified what appears to be a problem for granovagg.ds; it concerns column names. See output: str(granovagg.ds) function (data = NULL, revc = FALSE, main = "default_granova_title", xlab = NULL, ylab = NULL, conf.level = 0.95, plot.theme = "theme_granova_ds", ...)

attr(*, "source")= chr [1:241] "function (data = NULL, revc = FALSE, main = \"default_granova_title\", " ...

dimnames(schizzz)[2] =list(c("X X","Y Y"))

granovagg.ds(schizzz,revc=T) Error in parse(text = x) : :1:3: unexpected symbol [The issue is that if col headings have more than one term (see below), gg.ds fails.] 1: Y Y ^ schizzz X X Y Y 1 2.4 2.54 2 2.2 3.18 3 2.1 2.54 4 2.9 3.27 5 2.2 2.09 6 2.3 2.45 7 2.4 3.09 8 1.5 1.45 9 2.7 3.45 10 1.9 3.09 11 1.8 1.81 12 1.3 1.45 class(schizzz) [1] "data.frame" ===============If X X and Y Y are replaced w/ just X and Y respectively all is well. These are the schizophrenia data found in the first part of the section on .ds in my Elemental Graphics paper (w/ Jim H.). I found this by trying to use the original labels (after renaming the file to schizzz): names(schizzz) [1] "Before Treatment" "Six weeks After Trtmnt". The original granova.ds worked w/ this data set, but the gg version does not. Your numerical results print just fine. Thought you would want to know this even before I do any more testing. Best, Bob

From: Brian A. Danielak reply@reply.github.com To: rmpruzek rmpruzek@yahoo.com Sent: Saturday, October 8, 2011 6:59 PM Subject: Re: [granovaGG] granovagg.1w should let users suppress printing of visual squares (#128)

@rmpruzek - as has always been the case, the latest development version of granovaGG can be obtained by following the install_github directions on the granovaGG main page:

  library(devtools)

  # If you've already installed granovaGG, then detach and remove the current local version
  detach("package:granovaGG")
  remove.packages("granovaGG", lib = .libPaths())

  # ...and then install the latest development version
  install_github(repo="granovaGG", username="briandk", branch="dev")

Reply to this email directly or view it on GitHub: https://github.com/briandk/granovaGG/pull/128#issuecomment-2334276

rmpruzek commented 13 years ago

Brian, I worked w/ the latest functions I found, what will be no surprise to you, that some of the new edits we've agreed upon have not yet been implemented in the code. The one thing that caused me some consternation concerned changing of the main title and the name for the response. I got these to work after some study, and trials, like this: =====>

rat.1wgg=granovagg.1w(rat[,1],jj=.6,resid=T,group=rep(1:6,ea=10),main="One-way ANOVA display showing weight-gains for six groups of rats",ylab="Weight gains of rats in grams")

Below are by-group summary statistics of your input data -- these are ok, so I exclude numerics here... group group.mean trimmed.mean contrast variance standard.deviation group.size

rat.1wgg+title(main="",ylab="") =====> The resulting graphic is fine w/ this approach. Is this what you would recommend? BTW, I see the documentation for jj is rather opaque. I understand the idea of defaulting jj = NULL, but then we need to provide the user w/ the idea that tacitly, jj = 1 is the default, that values half this size can virtually eliminate jittering, but that larger values such as jj = 1.5 or 2 (or more) might be desirable for some data sets. Also, FYI, I found that if I switched from your current (small green dot) from 3/2 to 7.5/2 that the size of the grand mean dot in the middle (green) is just about right by my eye. I could add another thing or two, but I'm not trying to be comprehensive here. I have all I need for SMEP now, and will post the slide show I've done in our jointly held dropbox directory this evening. Thanks again, bob

From: Brian A. Danielak reply@reply.github.com To: rmpruzek rmpruzek@yahoo.com Sent: Saturday, October 8, 2011 6:59 PM Subject: Re: [granovaGG] granovagg.1w should let users suppress printing of visual squares (#128)

@rmpruzek - as has always been the case, the latest development version of granovaGG can be obtained by following the install_github directions on the granovaGG main page:

  library(devtools)

  # If you've already installed granovaGG, then detach and remove the current local version
  detach("package:granovaGG")
  remove.packages("granovaGG", lib = .libPaths())

  # ...and then install the latest development version
  install_github(repo="granovaGG", username="briandk", branch="dev")

Reply to this email directly or view it on GitHub: https://github.com/briandk/granovaGG/pull/128#issuecomment-2334276

briandk commented 13 years ago

@rmpruzek - I've put in several major updates to try and deal with the issues you raise. Since this particular issue ("letting users suppress the printing of visual squares") has been resolved, I would encourage you offer your comments toward the currently open issues here:

https://github.com/briandk/granovaGG/issues

Several things to note:

I changed the code to enlarge the grand mean dot in granova.1w on October 1. I'm surprised the version you were using did not reflect the change in size from 3/2 to 2.5. You can view that commit here: https://github.com/briandk/granovaGG/commit/2e06ad951bdf0b3ce2a0101b3991e04aee79a24b
There is an ongoing discussion about how to deal with handing NULL in as a parameter value for graph labels. I would very much value your input on that discussion. If you'd like to comment, please do so using the comment box at the bottom of the discussion thread here: https://github.com/briandk/granovaGG/pull/133

Also, I would strongly urge you to:

Install the latest development version of granovaGG by following these directions

library(devtools)

# If you've already installed granovaGG, then detach and remove the current local version
detach("package:granovaGG")
remove.packages("granovaGG", lib = .libPaths())

# ...and then install the latest development version
install_github(repo="granovaGG", username="briandk", branch="dev")

Include the output from sessionInfo() when posting code to reproduce an issue. When you do, it helps @Wildoane and I track exactly which version of the code you were using. Here's an example of posting sample code, with real output from the sessionInfo() command at the time I was running some code

data(anorexia.sub)
granovagg.ds(anorexia.sub)
## Note the expected behavior, actual behavior, and problems here
## ...
## Then include the output from sessionInfo()
> sessionInfo()
R version 2.13.2 (2011-09-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] C/en_US.UTF-8/C/C/C/C

attached base packages:
[1] splines   grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] granovaGG_1.0.20111009210450 granova_2.0                  car_2.0-11                  
[4] survival_2.36-9              nnet_7.3-1                   MASS_7.3-14                 
[7] gridExtra_0.8                RColorBrewer_1.0-5           ggplot2_0.8.9               
[10] proto_0.3-9.2                reshape_0.8.4                plyr_1.6                    
[13] devtools_0.4                

loaded via a namespace (and not attached):
[1] RCurl_1.6-10 digest_0.5.1 tools_2.13.2

briandk / granovaGG

granovagg.1w should let users suppress printing of visual squares #128

Visual comparison of suppressed/unsuppressed squares

Visual comparison of legends when squares are suppressed/unsuppressed