data-8 / textbook

The textbook Computational and Inferential Thinking: The Foundations of Data Science
http://www.inferentialthinking.com
Other
789 stars 281 forks source link

Clarify section 16.3 describing procedure for bootstrapped confidence interval of prediction #153

Open brshallo opened 3 years ago

brshallo commented 3 years ago

An approximate 95% prediction interval of scores has been constructed by taking the "middle 95%" of the predictions, that is, the interval from the 2.5th percentile to the 97.5th percentile of the predictions. The interval ranges from about 127 to about 131. -Section 16.3

Describing this as a confidence interval of the prediction would be more clear. "Prediction intervals" typically refer to the expected range for future individual observations (which would be much broader).

davidwagner commented 3 years ago

Excellent point. Thank you, and especially for suggesting replacement wording; that is helpful. We've discussed this in the past and my takeaway was that a true prediction interval would be more useful in practice but is also more work to explain and introduce, so as a compromise we are instead introducing the confidence interval of the prediction. If our language is not consistent with standard usage of those terms in stats, it makes sense to me to change it as you suggest. I am not a statistician so I don't feel well qualified to judge but I'll keep this to raise with the stats experts on the team. Thanks again.

brshallo commented 3 years ago

You are welcome.

I recently wrote a post on Simulating Prediction Intervals that walks through one potential approach. My examples are in R but the procedure is based largely on the python implementation I reference -- the author of which recently put there work into a package at saattrupdan/doubt -- also see my notes in the appendix on conformal inference.

(Maybe among those can find something if decide want to link to resources even if not wanting to cover directly in 2nd edition.)

davidwagner commented 3 years ago

@brshallo Thanks. Unfortunately, that looks more complicated than I'd want to explain in an intro class for frosh and sophomores with no prior experience in any of this stuff, so probably not viable for our constraints.