Closed briandk closed 13 years ago
If there are only two groups, then most users (and I) would prefer that the printed statistic be t, not F. Of course this means for that 2 gp case that the positive square root of F be used as t; so the correct label is probably | t-statistic | , to show that the sign is not considered. Otherwise, I like your changes. bp
@rmpruzek, I'm not getting agreement that the positive square root of the F-statistic is the t-statistic. Obviously it should be, but my code isn't showing that:
> library(granova)
> data(anorexia.sub)
> astack <- stack(anorexia.sub)
> lm1 <- lm(values ~ ind, data = astack)
> summary(lm1)
Call:
lm(formula = values ~ ind, data = astack)
Residuals:
Min 1Q Median 3Q Max
-15.294 -2.454 1.106 4.004 11.106
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 90.494 1.689 53.578 < 2e-16 ***
indPrewt -7.265 2.389 -3.041 0.00467 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.964 on 32 degrees of freedom
Multiple R-squared: 0.2242, Adjusted R-squared: 0.2
F-statistic: 9.25 on 1 and 32 DF, p-value: 0.004673
> t.test(anorexia.sub[, 1], anorexia.sub[, 2], paired = TRUE)
Paired t-test
data: anorexia.sub[, 1] and anorexia.sub[, 2]
t = -4.1849, df = 16, p-value = 0.0007003
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-10.94471 -3.58470
sample estimates:
mean of the differences
-7.264706
/cc @Wildoane
Brian, In the t.test the paired=FALSE must be set. Try it, you should get agreement then. bob
Brian, Because your heading cited .1w, I answered w/ that in mind. But it is not clear to me why you use lm( ) AND t.test w/ paired = TRUE; there's the rub. bob
Bob,
The error was mine. I got confused because we use a paired t-test in granovagg.ds (line #138) and in granova.ds (line #110). I forgot that the reason we use paired for .ds
is that it's fundamentally a dependent sample analysis.
The reason I used t.test()
and lm()
above was to verify that F == t^2
. You're absolutely right: my error was in running the .1w
t-test as a paired t-test. After re-running the code on a non-paired t-test, I've verified the expected behavior:
> data(anorexia.sub)
> astack <- stack(anorexia.sub)
> summary(lm(values ~ ind, data = astack)) # Note F = 9.25
Call:
lm(formula = values ~ ind, data = astack)
Residuals:
Min 1Q Median 3Q Max
-15.294 -2.454 1.106 4.004 11.106
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 90.494 1.689 53.578 < 2e-16 ***
indPrewt -7.265 2.389 -3.041 0.00467 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.964 on 32 degrees of freedom
Multiple R-squared: 0.2242, Adjusted R-squared: 0.2
F-statistic: 9.25 on 1 and 32 DF, p-value: 0.004673
> t.test(anorexia.sub[, 1], anorexia.sub[, 2])
Welch Two Sample t-test
data: anorexia.sub[, 1] and anorexia.sub[, 2]
t = -3.0414, df = 25.986, p-value = 0.005324
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-12.17472 -2.35469
sample estimates:
mean of x mean of y
83.22941 90.49412
> (-3.0414)^2 # t-squared approximately equals F = 9.25
[1] 9.250114
Does that look right to you now?
Brian, Yes, it is spot-on correct. Good to see you wrapping this up. Thanks again, bob
@rmpruzek - the latest changes I just pushed should show a properly computed t-statistic in the two-group case. Below is example code from a two-group case (the anorexia.sub
data) and the visual output confirming that we display the proper t-statistic:
> data(anorexia.sub)
> astack <- stack(anorexia.sub) # stacking the data so we can a grouping column
> granovagg.1w(astack$values, group = astack$ind)
By-group summary statistics for your input data (ordered by group means)
group group.mean trimmed.mean contrast variance standard.deviation group.size
2 Prewt 83.23 83.24 -3.63 25.17 5.02 17
1 Postwt 90.49 91.80 3.63 71.83 8.48 17
Below is a linear model summary of your input data
Call:
lm(formula = score ~ group, data = owp$data)
Residuals:
Min 1Q Median 3Q Max
-15.294 -2.454 1.106 4.004 11.106
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 90.494 1.689 53.578 < 2e-16 ***
groupPrewt -7.265 2.389 -3.041 0.00467 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.964 on 32 degrees of freedom
Multiple R-squared: 0.2242, Adjusted R-squared: 0.2
F-statistic: 9.25 on 1 and 32 DF, p-value: 0.004673
> with(anorexia.sub, t.test(Prewt, Postwt))
Welch Two Sample t-test
data: Prewt and Postwt
t = -3.0414, df = 25.986, p-value = 0.005324
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-12.17472 -2.35469
sample estimates:
mean of x mean of y
83.22941 90.49412
And here's a comparison of what it looks like when dosqrs
goes from TRUE
to FALSE
I have two more questions on this issue:
dosqrs
argument to print.squares
, which is more informative and easier to type?cc: @rmpruzek, @Wildoane
Brian, Both are good ideas. Yes to each. Thanks, bob
From: Brian A. Danielak reply@reply.github.com To: rmpruzek rmpruzek@yahoo.com Sent: Thursday, October 6, 2011 5:39 PM Subject: Re: [granovaGG] granovagg.1w should let users suppress printing of visual squares (#128)
I have two more questions on this issue:
dosqrs
argument to print.squares
, which is more informative and easier to type?cc: @rmpruzek, @Wildoane
Reply to this email directly or view it on GitHub: https://github.com/briandk/granovaGG/pull/128#issuecomment-2315375
Brian, Again, good to see. This is helpful. bob
Looks good to me, too.
I changed dosqrs
to print.squares
. I also fixed the printed output for the two-group case. Now, when there are two groups you see t-test output; when there are more than two groups you get a linear model summary.
> granovagg.1w(anorexia.sub)
By-group summary statistics for your input data (ordered by group means)
group group.mean trimmed.mean contrast variance standard.deviation group.size
1 Prewt 83.23 83.24 -3.63 25.17 5.02 17
2 Postwt 90.49 91.80 3.63 71.83 8.48 17
Below is a t-test summary of your input data
Welch Two Sample t-test
data: unstacked.data[, 1] and unstacked.data[, 2]
t = -3.0414, df = 25.986, p-value = 0.005324
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-12.17472 -2.35469
sample estimates:
mean of x mean of y
83.22941 90.49412
> with(mpg, granovagg.1w(hwy, group = manufacturer)) -> p
By-group summary statistics for your input data (ordered by group means)
group group.mean trimmed.mean contrast variance standard.deviation group.size
8 land rover 16.50 16.50 -6.94 3.00 1.73 4
9 lincoln 17.00 17.00 -6.44 1.00 1.00 3
7 jeep 17.62 17.83 -5.82 10.55 3.25 8
3 dodge 17.95 17.70 -5.49 12.77 3.57 37
10 mercury 18.00 18.00 -5.44 1.33 1.15 4
4 ford 19.36 18.60 -4.08 11.07 3.33 25
2 chevrolet 21.89 22.00 -1.55 26.10 5.11 19
11 nissan 24.62 24.78 1.18 25.92 5.09 13
14 toyota 24.91 24.68 1.47 38.02 6.17 34
13 subaru 25.57 25.70 2.13 1.34 1.16 14
12 pontiac 26.40 26.33 2.96 1.30 1.14 5
1 audi 26.44 26.17 3.00 4.73 2.18 18
6 hyundai 26.86 26.70 3.42 4.75 2.18 14
15 volkswagen 29.22 28.18 5.78 28.26 5.32 27
5 honda 32.56 32.57 9.12 6.53 2.55 9
The following groups are likely to be overplotted
group group.mean contrast
7 jeep 17.62 -5.82
3 dodge 17.95 -5.49
10 mercury 18.00 -5.44
11 nissan 24.62 1.18
14 toyota 24.91 1.47
12 pontiac 26.40 2.96
1 audi 26.44 3.00
Below is a linear model summary of your input data
Call:
lm(formula = score ~ group, data = owp$data)
Residuals:
Min 1Q Median 3Q Max
-9.9118 -2.3600 -0.2911 2.0882 14.7778
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26.44444 0.98206 26.928 < 2e-16 ***
groupchevrolet -4.54971 1.37044 -3.320 0.001055 **
groupdodge -8.49850 1.19734 -7.098 1.74e-11 ***
groupford -7.08444 1.28796 -5.501 1.05e-07 ***
grouphonda 6.11111 1.70097 3.593 0.000404 ***
grouphyundai 0.41270 1.48473 0.278 0.781304
groupjeep -8.81944 1.77043 -4.982 1.28e-06 ***
groupland rover -9.94444 2.30313 -4.318 2.39e-05 ***
grouplincoln -9.44444 2.59828 -3.635 0.000347 ***
groupmercury -8.44444 2.30313 -3.667 0.000309 ***
groupnissan -1.82906 1.51651 -1.206 0.229082
grouppontiac -0.04444 2.10628 -0.021 0.983184
groupsubaru -0.87302 1.48473 -0.588 0.557141
grouptoyota -1.53268 1.21450 -1.262 0.208298
groupvolkswagen 2.77778 1.26783 2.191 0.029509 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.167 on 219 degrees of freedom
Multiple R-squared: 0.5398, Adjusted R-squared: 0.5104
F-statistic: 18.35 on 14 and 219 DF, p-value: < 2.2e-16
Brian, I'd totally forgotten this point, but your thoroughness has ensured that I remember. The var.equal arg for the t.test function should be TRUE in gg.1w, not FALSE, which is the default. (FYI, I think this was a screw up on the Core Team's part to default FALSE, leading to the Welch test; many reasons can be sited.) But certainly for our function here we do NOT want Welch. BTW, I trust that T sufficient for TRUE, F for FALSE. Two other small matters, but these qs. comes from my CRAN 1.0 version of gg.1w, not your github update:
Brian, Is there one particular way you'd recommend to acquire your most recent versions of the functions gg.1w, gg.ds, and gg.contr from github? This may be as plain as the nose on my face, but I'm not certain if the way I'm going is the preferred way. Thanks, bob
@rmpruzek - as has always been the case, the latest development version of granovaGG can be obtained by following the install_github directions on the granovaGG main page:
library(devtools)
# If you've already installed granovaGG, then detach and remove the current local version
detach("package:granovaGG")
remove.packages("granovaGG", lib = .libPaths())
# ...and then install the latest development version
install_github(repo="granovaGG", username="briandk", branch="dev")
Brian, I've identified what appears to be a problem for granovagg.ds; it concerns column names. See output: str(granovagg.ds) function (data = NULL, revc = FALSE, main = "default_granova_title", xlab = NULL, ylab = NULL, conf.level = 0.95, plot.theme = "theme_granova_ds", ...)
dimnames(schizzz)[2] =list(c("X X","Y Y"))
granovagg.ds(schizzz,revc=T) Error in parse(text = x) :
:1:3: unexpected symbol [The issue is that if col headings have more than one term (see below), gg.ds fails.] 1: Y Y ^ schizzz X X Y Y 1 2.4 2.54 2 2.2 3.18 3 2.1 2.54 4 2.9 3.27 5 2.2 2.09 6 2.3 2.45 7 2.4 3.09 8 1.5 1.45 9 2.7 3.45 10 1.9 3.09 11 1.8 1.81 12 1.3 1.45 class(schizzz) [1] "data.frame" ===============If X X and Y Y are replaced w/ just X and Y respectively all is well. These are the schizophrenia data found in the first part of the section on .ds in my Elemental Graphics paper (w/ Jim H.). I found this by trying to use the original labels (after renaming the file to schizzz): names(schizzz) [1] "Before Treatment" "Six weeks After Trtmnt". The original granova.ds worked w/ this data set, but the gg version does not. Your numerical results print just fine. Thought you would want to know this even before I do any more testing. Best, Bob
From: Brian A. Danielak reply@reply.github.com To: rmpruzek rmpruzek@yahoo.com Sent: Saturday, October 8, 2011 6:59 PM Subject: Re: [granovaGG] granovagg.1w should let users suppress printing of visual squares (#128)
@rmpruzek - as has always been the case, the latest development version of granovaGG can be obtained by following the install_github directions on the granovaGG main page:
library(devtools)
# If you've already installed granovaGG, then detach and remove the current local version
detach("package:granovaGG")
remove.packages("granovaGG", lib = .libPaths())
# ...and then install the latest development version
install_github(repo="granovaGG", username="briandk", branch="dev")
Reply to this email directly or view it on GitHub: https://github.com/briandk/granovaGG/pull/128#issuecomment-2334276
Brian, I worked w/ the latest functions I found, what will be no surprise to you, that some of the new edits we've agreed upon have not yet been implemented in the code. The one thing that caused me some consternation concerned changing of the main title and the name for the response. I got these to work after some study, and trials, like this: =====>
rat.1wgg=granovagg.1w(rat[,1],jj=.6,resid=T,group=rep(1:6,ea=10),main="One-way ANOVA display showing weight-gains for six groups of rats",ylab="Weight gains of rats in grams")
Below are by-group summary statistics of your input data -- these are ok, so I exclude numerics here... group group.mean trimmed.mean contrast variance standard.deviation group.size
rat.1wgg+title(main="",ylab="") =====> The resulting graphic is fine w/ this approach. Is this what you would recommend? BTW, I see the documentation for jj is rather opaque. I understand the idea of defaulting jj = NULL, but then we need to provide the user w/ the idea that tacitly, jj = 1 is the default, that values half this size can virtually eliminate jittering, but that larger values such as jj = 1.5 or 2 (or more) might be desirable for some data sets. Also, FYI, I found that if I switched from your current (small green dot) from 3/2 to 7.5/2 that the size of the grand mean dot in the middle (green) is just about right by my eye. I could add another thing or two, but I'm not trying to be comprehensive here. I have all I need for SMEP now, and will post the slide show I've done in our jointly held dropbox directory this evening. Thanks again, bob
From: Brian A. Danielak reply@reply.github.com To: rmpruzek rmpruzek@yahoo.com Sent: Saturday, October 8, 2011 6:59 PM Subject: Re: [granovaGG] granovagg.1w should let users suppress printing of visual squares (#128)
@rmpruzek - as has always been the case, the latest development version of granovaGG can be obtained by following the install_github directions on the granovaGG main page:
library(devtools)
# If you've already installed granovaGG, then detach and remove the current local version
detach("package:granovaGG")
remove.packages("granovaGG", lib = .libPaths())
# ...and then install the latest development version
install_github(repo="granovaGG", username="briandk", branch="dev")
Reply to this email directly or view it on GitHub: https://github.com/briandk/granovaGG/pull/128#issuecomment-2334276
@rmpruzek - I've put in several major updates to try and deal with the issues you raise. Since this particular issue ("letting users suppress the printing of visual squares") has been resolved, I would encourage you offer your comments toward the currently open issues here:
https://github.com/briandk/granovaGG/issues
Several things to note:
granova.1w
on October 1. I'm surprised the version you were using did not reflect the change in size from 3/2 to 2.5. You can view that commit here: https://github.com/briandk/granovaGG/commit/2e06ad951bdf0b3ce2a0101b3991e04aee79a24bNULL
in as a parameter value for graph labels. I would very much value your input on that discussion. If you'd like to comment, please do so using the comment box at the bottom of the discussion thread here: https://github.com/briandk/granovaGG/pull/133Also, I would strongly urge you to:
Install the latest development version of granovaGG by following these directions
library(devtools)
# If you've already installed granovaGG, then detach and remove the current local version
detach("package:granovaGG")
remove.packages("granovaGG", lib = .libPaths())
# ...and then install the latest development version
install_github(repo="granovaGG", username="briandk", branch="dev")
Include the output from sessionInfo()
when posting code to reproduce an issue. When you do, it helps @Wildoane and I track exactly which version of the code you were using. Here's an example of posting sample code, with real output from the sessionInfo()
command at the time I was running some code
data(anorexia.sub)
granovagg.ds(anorexia.sub)
## Note the expected behavior, actual behavior, and problems here
## ...
## Then include the output from sessionInfo()
> sessionInfo()
R version 2.13.2 (2011-09-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] C/en_US.UTF-8/C/C/C/C
attached base packages:
[1] splines grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] granovaGG_1.0.20111009210450 granova_2.0 car_2.0-11
[4] survival_2.36-9 nnet_7.3-1 MASS_7.3-14
[7] gridExtra_0.8 RColorBrewer_1.0-5 ggplot2_0.8.9
[10] proto_0.3-9.2 reshape_0.8.4 plyr_1.6
[13] devtools_0.4
loaded via a namespace (and not attached):
[1] RCurl_1.6-10 digest_0.5.1 tools_2.13.2
I've resurrected the
dosqrs
argument fromgranova.1w
. Users can suppress squares by addingdoqsrs = FALSE
to agranovagg.1w
call. But,dosqrs
still defaults toTRUE
.As an example:
The graphics below illustrate what it now looks like if a user suppresses the squares:
Visual comparison of suppressed/unsuppressed squares
Visual comparison of legends when squares are suppressed/unsuppressed