ajdamico / convey

variance of distribution measures estimation of survey data
GNU General Public License v3.0
17 stars 7 forks source link

svywattsdec text check? #390

Closed ajdamico closed 1 year ago

ajdamico commented 1 year ago

you say <1% in this part of the book.. but watts pov. gap ratio and theil(poor) are at 4% and 3% ...is that expected? does it make sense to clarify how/why the PRB below 1% only applies to watts and fgt0 ?

For the variance estimators, we estimate the PRB using the code below. Note that the bias is still relatively small, with absolute values of the PRB below 1%.

# estimate the variance of the components estimators
# using the empirical variance of the estimates
(vartheta.popest <-
   diag(var(t(
     sapply(wattsdec.estimate.list , coef)
   ))))

##                watts                 fgt0 watts pov. gap ratio 
##         6.141434e-05         1.015260e-04         9.241259e-04 
##          theil(poor) 
##         1.108525e-03

# estimate the expected value of the Watts index variance estimator
# using the (estimated) expected value of the variance estimates
(vartheta.exp <-
    rowMeans(sapply(wattsdec.estimate.list , function(z)
      diag(vcov(
        z
      )))))

##                watts                 fgt0 watts pov. gap ratio 
##         6.100902e-05         1.013968e-04         9.613831e-04 
##          theil(poor) 
##         1.070018e-03

# estimate the percentage relative bias of the variance estimators
(percentage_relative_bias <-
    100 *  (vartheta.exp / vartheta.popest - 1))

##                watts                 fgt0 watts pov. gap ratio 
##           -0.6599717           -0.1273107            4.0316155 
##          theil(poor) 
##           -3.4736907

stopifnot(abs(percentage_relative_bias[1]) < 1)
guilhermejacob commented 1 year ago

SSW (1992, Sec. 5.2) has a table that shows the impact of the Bias Ratio ($\text{BR} [\widehat{\theta} ] = \frac{\text{B}[ \widehat{\theta} ]}{\sqrt{\text{Var}[ \widehat{\theta} ]}}$) on confidence statements. As long as BR < 20% and the normal approximation is valid, impact is minimal.

However we work with something else. PRB is not ideal for that either, but it is a good measure. The one we use is what I call Squared Bias Component, the fraction of the MSE attributed to Squared Bias. If that is below 5% (or 10%), we should be good.

This can recreate the table in SSW:

# bias ratio, bias component and coverage probability functions
br.fun = function(sbc) sqrt( sbc / ( 1 - sbc ) )
sbc.fun = function(br) br^2 / ( 1 + br^2 )
p0.fun <- function( bratio , alpha = .05 ) {
  z.calc <- qnorm( 1 - alpha/2 )
  lt <- pnorm( z.calc - bratio , 0 , 1 , lower.tail = FALSE ) # lower tail
  ut <- pnorm( - z.calc - bratio , 0 , 1 , lower.tail = TRUE ) # upper tail
  1 - lt - ut # complement
}

# recreate table
br.vec <- c( 0 , .05 , .10 , .20 , .30 , .50 , 1 )
round( cbind( "BR" = br.vec , "SBC" = sbc.fun( br.vec ) , "P0" = p0.fun( br.vec , .05 ) ) , 4 )

#      BR   SBC    P0
# [1,] 0.00 0.0000 0.9500
# [2,] 0.05 0.0025 0.9497
# [3,] 0.10 0.0099 0.9489
# [4,] 0.20 0.0385 0.9454
# [5,] 0.30 0.0826 0.9396
# [6,] 0.50 0.2000 0.9209
# [7,] 1.00 0.5000 0.8299

This gives an idea about the effect of the bias on confidence statements. However, this is true when the normal approximation is valid -- i.e., large enough sample size for a CLT to hold. If the normal approximation is not good, these coverage probabilities will be misleading.

guilhermejacob commented 1 year ago

One solution is to drop PRB (it is used in the zenga paper, I believe) and just go with SBC. Requires some rewriting, but not too problematic.

ajdamico commented 1 year ago

so just going with this change? https://github.com/guilhermejacob/context/pull/24/commits/6193f27ad673bac80fbe906e0c3c84e18832b5e5