make inference() go away

See (https://github.com/beanumber/oiLabs-mosaic/issues/2)

No offense to whoever wrote this, but I think I hate inference(). It's the worst: an undocumented, magic, black box function.

I'll offer two potential solutions:

Document this function thoroughly
Make it go away.

I favor the latter. The question is does this foster or impede understanding of statistical concepts? I argue the latter. Is it realistic to suggest that if you want to do inference, you can just plug into a mysterious function? Or should we be reinforcing a conceptual understanding of inference by breaking the procedures into small steps? And isn't it important that a student specify how she is doing inference?

I don't teach with this anyway. If you want to do a t-test, use t.test(). If you want to find a p-value in a normal sampling distribution, use pnorm(). If you want to find a p-value in a t-distribution, use pt(). If you want to find a p-value in a data-generated sampling distribution, use mosaic::pdata().

On a slightly more esoteric level, I'm not sure that any of the functions in this package are really useful. I don't mind teaching students about functions in dplyr or mosaic because I know those packages are widely-used and likely to be well-supported in the future. But the function in this package never get used outside of these labs, so are they really necessary?

/diatribe

The function is undocumented in the sense that it doesn't have a help function, since we never got around to putting these in a package. It is documented in the lab where it's introduced, or at least it was in the original version of that lab.

Here are the reasons why I like the function:

It uses the same syntax regardless of type of variables/tests
It gives human (student) readable errors
It implements all the tests as described in OpenIntro Statistics (OS2 currently, but I'm working on updating to OS3)
I don't think breaking the procedures into small steps reinforce conceptual understanding, in fact I would argue that being able to use a consistent syntax across all types of tests is a better approach for reinforcing conceptual understanding of the unified nature of inference (as opposed to a series of tests)
I have seen a substantial decrease in frustration / struggle level of students when using this function in their projects (for example, there are some warnings/errors thrown when they try to use an unexpected variable type)

Current problems with the function:

Lack of documentation in a help file
Code is messy as I wrote it in bits and pieces (as I mentioned elsewhere I'm currently working on cleaning it up)

OK, so first of all in reading this over my initial post sounded far snippier than I intended, so I apologize for that.

Nevertheless in the interest of collegial dialogue I will press on. :-)

I agree with most of your points, and I can see how the clarification of the error messages would be helpful. I also haven't really used the function much, so forgive me if my allegations are off-base.

It's interesting that we both recognize the universality of the inferential process, but take opposite approaches towards emphasizing it. Your approach is to make one function that performs inference in a variety of common settings. My approach is to break the process down into common steps. I have no idea which is more effective! Maybe we could conduct an experiment?

Let me say a little bit more about my perspective on inference. I see it as a two-step process:

Construct the sampling distribution of the test statistic under the assumptions made
Locate the observed test statistic in this distribution I see this as the fundamental inferential process, and you might argue that a deep understanding of this is what defines "statistics". This is also a general framework that applies to any form of statistical inference (right?).

So in practice, to do step 1, you have to know what assumptions you are making and how that translates into the sampling distribution. If the sampling distribution is parametric, you have to know the parameters of that distribution. But then step 2 is easy, because it's just p*(x, params) (or 2*p*(x, params), or at worst 2 * (1 - p*(x, params, lower.tail = FALSE)), where * is the name of the distribution (e.g. norm, t, F, etc., or data in the case of a non-parametric distribution), x is the test statistic, and params are the parameters of *. None of these pieces are very hard, and they are all necessary to specify the test correctly. With confidence intervals it's basically the same thing except you use q*().

What's nice about this is that it works for any sampling distribution, even ones that you haven't programmed into inference(). Can inference() handle randomization distributions like @andrewpbray has written up here? Fisher's exact test? Inference for regression? Correlation? Are you going to update the function every time a new thing pops up?

For sake of discussion, I'm going to narrow the scope to the 8 (means/proportions) x (one/two sample) x (HT/CI) combinations of normal distribution based inferential situations.

In my ideal world, the use of the inference() function in labs would achieve two goals:

Theoretical goal: Stressing the unified nature/process/framework of inference. i.e. the 8 combinations are really specific instances of the general two-step process Ben outlined. This helps reduce the tediousness of teaching them.
Applied goal: Giving students a good tool to use to conduct inference.

My views on how inference() fares:

Theoretical goal: While the inference() function does have unified syntax across the 8 combinations, IMO it does not stress the unified nature/process/framework. Its use is akin to the use of the boxed commands in other software packages: set the dials (in our case 3 of them), enter the data, and get the output. Furthermore, the unified nature/process/framework (the sampling distribution and the test statistic) is conflated with the results.

Applied goal: An anecdote: I had a student who wanted to use inference() for her own work. She didn't understand that inference() wasn't part of base R, didn't find it flexible, and found documentation sparse. While the latter two issues can be alleviated with work on our part, this anecdote is illustrative of why most students won't use inference() beyond this class; they'll either use the long-established built-in tools Ben listed or some other software. If this is the case, then why use inference() to achieve the applied goal?

I feel these two goals are somewhat at odds with each other and having a two-birds-with-one-stone approach is difficult. Furthermore, I think that trying to have a single function (with ample documentation) cover all bases will only increase its bloat and opaqueness.

My proposal: we modify inference() to favor the theoretical goal, have as unified looking outputs as possible across the 8 combinations, and make its use the bulk of labs. Then as an appendix we articulate how you would conduct inference in practice using t.test(), prop.test(), chisq.test(), etc. Much like how I tell students that they are going to use normal tables and draw normal curves while taking the intro class, but will never do so in practice later.

I've not even looked at inference() so I won't comment on how it works or could be modified or whether it should exist at all, but I will put in a plug for not using any functions that are not in a package (even if it is your own package) where they are properly documented and play with the R system -- unless they are truly one-time-use. Students should expect ?inference and example(inference) to do something and should not be copying and pasting function definitions from lab documents.

I think we all agree that any function should have a help file that is accessible in an expected way, and the best way to do this is by including the function in a package. That is exactly the current project for all custom functions in the OpenIntro labs, so unless the decision is to scrap inference() altogether, the documentation problem will be solved within the current project.

The comment on whether student will keep using the function beyond the course is an important one. For me, applicability beyond the intro stat course is one of the most important reasons for teaching R. So I agree that teaching tools within R that might not easily extend beyond the course might be orthogonal to that goal. But if we went down this road for discussion one might say students won't use mosaic in their research/work either, I don't think this outweighs the benefits of teaching R with a consistent syntax early on.

@beanumber I think we agree on the learning objectives for inference -- I like your two step summary. Also, inference() doesn't do all the tests you listed, but it does the ones that are in the textbook, since it was written to accompany the methods introduced in the text (a narrow view perhaps, but did the job for the labs).

Here are a few important things that I think the function does well:

With every HT/CI it also plots the data and the sampling/randomization/bootstrap distribution. Base R functions like t.test don't do this. I think that visual is important for students to say things like "Ah, the centers of these distributions were close compared to how variable they are (as seen in the side-by-side box plots) so it's not surprising that I ended up with a large p-value (as seen in the sampling distribution sketch)." I think do a bit of EDA before inference and sketch your sampling distribution are important learning goals of intro stats, so a function that by default always shows these visualizations is useful.
inference() uses the same theoretical framework that is used in the textbook:
- prop.test gives $$\chi^2$$ scores, not $$Z$$ scores, but in the textbook students learn to do proportion tests with $$Z$$ scores, and how $$\chi^2$$ and $$Z$$ relate to each other is generally not discussed in intro stats.
- ANOVA is introduced as a way to test for many means, not as a model, so using inference() we can do an ANOVA as a hypothesis test without lm(). Later in the book we discuss the relationship between a regression model and ANOVA but using inference() allows for doing an ANOVA task in the lab at the same time it's introduced in class. (This parity is important for me, and also important for their projects mentioned in (4) below.)
inference() always expects variables (where each element in the vector is an observation in the sample) as input. t.test also does, but prop.test doesn't. The inconsistency is annoying.
(This one may be specific to me, but I doubt I'm the only one with such a project.) In my course students do a data analysis project where they find and use their own dataset. Prior to using the inference function this resulted in a lot of cases of "R gives me an error I don't understand and I don't know where to go from here", many of which were driven by data type/class issues that we don't discuss in intro stat. Obviously parsing through these is a skill (an important one), but the management of the project was getting way too overwhelming. The custom function that checks for data types, and reports custom error/warning messages has helped in this regard.

I'm open to solutions that address these goals. We might come up with our own solution, or there might already be something out there that I'm not aware of.

There is one thing inference() does that I think/thought is good, but I am not as firm on as the ones listed above: It highlights that the same problem (if certain conditions are met) can be solved via a parametric or simulation-based method. In the function the definitions of many of the arguments are the same, and you just need to flip the method switch. In some sense I think this is good. On the other hand I'm not sure if it helps students understand how the simulation-based method really works.

Notes:

I'm not convinced that plotting and inference should be coupled. The plots should generally come first, and they are not that hard to create. If you want to add plots to standard functions, you could create new versions. (See mosaic::xpnorm() or mosaic::xchisq.tex() for examples of this approach.) This allows new users to both become familiar with the standard tools and to get the extra behavior you would like them to see each time. Eventually, they may wean themselves of "extras". Another option would be to write plot methods for objects of class "htest".
I'm not sure what distinction you are making between ANOVA being or not being a model. To say that each population has the same standard deviation and is normally distributed but may have different means sounds like a model to me. And I believe there is a package that does a z-test for proportions. It would be easy to write such a thing -- I may have done it once upon a time. Also note that prop.test() does more than test for a single proportion and that not all of the tests it can do are 1-df tests.
The mosaic package fixes this.
I'm not sure what sorts of data format errors you are getting. Perhaps you can open a separate issue about that.
Applicable beyond the course does not mean "use no packages". R was designed around the use of packages. So whether code is in "base R" or a package is not the primary issue. Code that is not in a package should be avoided, and packages should be chosen well. If you write functions in your package that (a) are too tailored to teaching or (b) don't play well with others or (c) don't generalize well beyond the confines of the course, then the likelihood of continued use is low, even if the code is in a package.

Note about the phrase "never got around to putting these in a package". This sort of code should be born in a package and incubated and developed there. Then there is not need to later "put it into a package".

Sorry I'm late to the party here. It's been a good read, with some valuable thoughts.

@rpruim I'd think we'd all now agree about best practices for these things. This was less apparent to us in 2011 when we were first putting these things together as grad students. The motivation in moving the old code to a package on github was exactly as you say: to incubate it and develop it, for which this thread is a big help.

Another motivation for this function that I'm remembering: the t.test() function does the more robust Welch's t-test instead of the vanilla t-test. It was a bit confusing when the students would get different answers from R and when doing it "by hand".

In thinking about a single inference function versus @beanumber 's step-by-step approach, a couple thoughts come to mind. We probably all start these things with the students drawing the sampling distribution, calculating the statistic by hand, then finding a p-value. The step-by-step process follows in that same mold, with something like:

mu <- 3
xbar <- 2.5
s <- .7
n <- 14
SE <- s / sqrt(n)
stat <- (xbar - mu) / SE
pt(stat, df = n - 1) * 2`

While I like the cohesion with the pen-and-paper method, I have some reservations.

I miss the pictures.
This method is feasible with simple inferential methods; less so with the ones with more moving parts. Anyone want to calculate an F or the SE of beta-hat line by line? Uf.
This makes sense (as does the pen-and-paper method) if all really you have are some sample statistics (xbar and s), which is the case if you're doing problems out of a textbook. When you're analyzing data in R, you can play with the original data, so we can choose how much of the machinery we want to hide. lm(), for example, hides all of it, which I think is ok. @rudeboybert , regarding your thoughts on how hiding the machinery weakens the theoretical understanding, would it be sufficient to have them do a line-by-line version in R before using the black box? I mean, a computer is a powerful black box, right?, and that's kinda why we like it. With great power comes great responsibility and all that. I think part of that responsibility is to go through the exercise once by hand as a proof of concept, and then use whatever model diagnostic techniques you can afterwards to ensure you didn't do anything too reckless.

So right now my inclination would be to keep the inference() function, and view it as the lm() of the simpler inferential procedures. I like Randy's idea of writing methods for the graphics/diagnostics so that we have a summary.inf() and a plot.inf similar to lm(). Doing this, in addition to getting all the documentation in place will be quite a bit of work, so @mine-cetinkaya-rundel , you would need to be OK with recruiting assistance if you need it.

Also, @beanumber , as an acolyte of Michael Lavine, I feel the need to need to formally object to your characterization of statistical inference as being solely the null hypothesis significance test. Take it back, sir! Take it back!

OpenIntroStat / oilabs

make inference() go away #8