anhoej / qicharts2

R package: Quality improvement charts
38 stars 12 forks source link

Theoretically question about integration with infer and bootstrapping to draw Confidence Intervals. #40

Closed bharuch2 closed 1 year ago

bharuch2 commented 1 year ago

This is ... I suspect ... more my lack of total statistical understanding. -if so, apologies in advance.

It is my understanding that the qic() function (particularly with specification chart = '') generates 3 sigma limits by the the associate distribution with the chart as you've detailed. (beautifully written vignette. I've referred to it so many times).

I was also reading about Bootstrapping here (https://moderndive.com/8-confidence-intervals.html#bootstrap-process), with interesting comment about pro/cons of theory based vs. simulation based (section 8.7).

so, in the vignette, you generated a 24 values using rnorm(24) with seed of 19 + added a moderate variation with y[13:24] <- rnorm(12, mean = 2).

summary(o), gives the CL (aLCL, aUCL, to be -2.11, and 4.23) when using I chart (Gaussian distribution).

So, what I attempted to do was the bootstrap (sampling with distribution) to generate confidence limits.

So, I did this:

y2 <- as.data.frame(y)

Then,

y2 %>% specify( response = y) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean")

I think this generates 1000 repetition of sampling with replacement, and resulting df gives their means.

Now, what I think I want is the CL to identify the Shewhart 3 sigma lines. Which, I used percentile level for 99.7.

y2 %>% specify( response = y) %>% generate(reps = 1000, type = "bootstrap") %>% calculate(stat = "mean") %>% get_confidence_interval(level = 0.997, type = "percentile")

which results (0.25, 1.86).

I guess my understanding is this *summary(o) function gives the CI of the 24 points based on the Gaussian distribution specified by "I" chart CL (-2.11 to 4.23)

*bootstrap method gives the standard error of the distribution of mean of the 24 points. -It was understanding that the standard errors should be similar. CL (0.25 to 1.86).

But they clearly are not. Would you have insight into what I'm missing?

Thanks.

anhoej commented 1 year ago

Thank you. This is very interesting but far above my statistical understanding. I'm not a statistician - just a medical doctor :-) But I'll love to hear if you find out anything more about the use of bootstrapping method in statistical process control.

huftis commented 1 year ago

@bharuch2, I’m a statistician, so I’ll answer. No, bootstrapping is not valid in this case. The whole bootstrap method is based on the assumption that you have a statistical distribution, i.e., that your process is stable/predictable. Much of the point of using SPC methodology is that the 3 sigma limits (in contrast to 3 SD limits) are valid even when your process is not stable/predictable. In this situation, you don’t have a statistical distribution, so the bootstrap is not valid.

anhoej commented 1 year ago

Thanks again, Karl Ove :)

bharuch2 commented 1 year ago

Perfect! Thank you!