anhoej / qicharts2

R package: Quality improvement charts
36 stars 12 forks source link

CL on xbar different to standard formulas #28

Closed TetraGenius closed 3 years ago

TetraGenius commented 3 years ago

Dear Jacob,

From work with my undergraduate students it seems that the part where qic calculates std.dev it gives a different value to what we expect: we could use the following from general stats for 25 samples of 17 each: ss <- 1:25 for (j in 1:25){ scalc <- 0 for (k in 1:17){ scalc <- scalc + ((manualSDC[k,j]-mean(manualSDC[,j]))^2) #where manualSDC has 17 rows & 25 columns } ss[j] <- (scalc/24)^0.5 } ssMean <- mean(ss) #calculates sbar

our sbar is different from the one you calculate in line 191 in helper.functions: stdev <- sqrt(sum((x$y.length[base] - 1) * x$y.sd[base]^2, na.rm = TRUE) / sum(x$y.length[base] - 1, na.rm = TRUE))

I'm not quite sure how your function gets sd in one function

Could you help me understand where the difference comes from?

Kind regards,

anhoej commented 3 years ago

Thanks for your interest in qicharts2.

There are several ways to calculate SD in control charts. qic uses the formulas provided by Montgomery. Se appendix 1 in https://anhoej.github.io/qicharts2/articles/qicharts2.html

I cannot reproduce your example without a reproducible example. But do I understand correctly that you are calculating the pooled SD of all subgroups. Most authorities would deem this incorrect as this would include both common and special cause variation.

TetraGenius commented 3 years ago

Thanks Jacob, Thanks for your quick response. I will look at the link to see if I understand.

You are correct, I do calculate the pooled SD of my subgroups during initialisation. I understood that special variation will be removed by removing the outliers during initialisation, and then re-calculation of limits.

The book we have been using the past few years follows this approach.

TetraGenius commented 3 years ago

the formula in the appendix 1: Xbar: x¯¯±A3s¯, A3= constant depending on the sample size, s¯= weighted sample standard deviation.

According to my knowledge the sbar weighted sample standard deviation is the values I used (with constant sample size)

I think it is best if I do a reproducible example to post. Maybe I discover my mistake while I doing it!

anhoej commented 3 years ago

That explains the difference. As I said, there are many ways to calculate SDs for control charts, but I haven't heard of that one ;-) The problem is that using your method you would need control limits to remove outliers, but you also need outliers to calculate control limits.

The principle behind Mongomery's (and Shewhart's) calculations are very similar to good old fashioned analysis of variance (ANOVA) where you compare the within subgroup SD with the between subgroup SD. If the latter is significantly larger than the former it is a signal of special cause variation.

anhoej commented 3 years ago

I think our two previous comments crossed each other. Appendix 1 in the vignette explains the principles, but Montgomery's formulas are actually more complicated than that. He has a formula for equal sample size and for varying sample size. qic attempts to pick the right one for the data. You could also check against the qcc package. I actually revised qicharts2 a couple of years ago to get similar results as Montgomery's examples and qcc.

TetraGenius commented 3 years ago

Dear Jacob, I have found my error. As usually it was in something I always stress my students not to do; I used number of samples(25) in place of number in sample(17) once I spotted that the reproducible example was obviously correct in both methods...

Sorry for wasting your time with something so trivial...

Kind regards,

TetraGenius commented 3 years ago

Also, I see you use sd and then adapt to the sample sd, which is the same formals we use. (divide by (n-1)). I found a web page with the weighted formulas as well; thanks.