ProjectMOSAIC / mosaic

Project MOSAIC R package
http://mosaic-web.org/
93 stars 26 forks source link

chi-square goodness-of-fit with xchisq.test #713

Closed VectorPosse closed 6 years ago

VectorPosse commented 6 years ago

According to the help file for xchisq.test, this should work:

xchisq.test(~ cyl, data = mtcars)

As far as I can tell, ~ cyl should be converted to tally(~ cyl, data = mtcars).

This does work:

xchisq.test(tally(~ cyl, data = mtcars), data = mtcars).

So I'm not sure where the formula conversion is failing.

(As an aside related to https://github.com/ProjectMOSAIC/mosaic/issues/633, it would be nice if chisq.test could function the way other .test functions do without having to explain to my students why we need the extra x in this one case only. But that's a minor gripe.)

nicholasjhorton commented 6 years ago

Thanks for your note. The documentation has been fixed (see https://github.com/ProjectMOSAIC/mosaic/commit/185e6d30b15be9e15955e6019c2742de6be4e460) to note that tally(x, data) needs to be tally(~ x, data).

Note that your command should be simplified to:

xchisq.test(tally(~ cyl, data = mtcars))

In regards to the xchisq.test() vs. chisq.test() I would defer to @rpruim

VectorPosse commented 6 years ago

Hi Nick. Thanks for the response, but it really isn't the documentation that I faulted here. In fact, it looks like the help file was already correct: if x is a formula (like ~ cyl), then the tally command shouldn't be inserting an extra tilde in there, right?

Either way, the implication is that xchisq.test(~ cyl, data = mtcars) should work, and it doesn't. What am I missing?

rpruim commented 6 years ago

This appears to be a bug in the way that the default value of p was calculated. I've fixed it in beta.

I also reverted the documentation change since the original was correct.

xchisq.test(~ cyl, data = mtcars)
## 
##  Chi-squared test for given probabilities
## 
## data:  x
## X-squared = 2, df = 2, p-value = 0.3
## 
##    11        7       14   
## (10.67)  (10.67)  (10.67) 
##  [0.01]   [1.26]   [1.04] 
## < 0.10>  <-1.12>  < 1.02> 
##      
## key:
##  observed
##  (expected)
##  [contribution to X-squared]
##  <Pearson residual>
VectorPosse commented 6 years ago

Awesome. Thanks!

rpruim commented 6 years ago

Two comments: