Closed ajdamico closed 3 years ago
for both this and #212, are there published variance estimates anywhere that we could match? otherwise the ?help and contextbook should both probably mention that they are your own calculation.. sorry if i'm misunderstanding something
You can try with the datasets here. But I don't think they will match. The Gini index is slightly different.
Take a look:
hh.data <- haven::read_stata("http://web.cas.suffolk.edu/faculty/jhaughton/hh.dta")
hh.data <- data.frame( hh.data )
hh.data <- transform( hh.data , weighti = weight * famsize )
pce.data <- haven::read_stata("http://web.cas.suffolk.edu/faculty/jhaughton/consume.dta")
pce.data <- data.frame( pce.data )
final <- merge( hh.data , pce.data , by = c( "hhcode" ))
library(survey)
#srs.des <- svydesign(ids = ~1, data = hh.data )
srs.des <- svydesign( ids = ~thana , strata = ~region , weights = ~weighti , data = final )
library(convey)
srs.des <- convey_prep( srs.des )
svyfgt( ~pcexp , srs.des , abs_thresh = 3000 , g = 0 , na.rm = TRUE ) # matches link below
svygini( ~pcexp , subset( srs.des, pcexp > 0 ) , na.rm = TRUE ) # doesn't match link below
You can check some results published in the book here. For now, I'll proceed with the decomposition-based demonstration. Ok, @ajdamico ?
hi, the more you can explain about why they don't match, the better.. thanks
This shows how SST index can be decomposed:
# load the convey package
library(convey)
# load the survey library
library(survey)
# load the vardpoor library
library(vardpoor)
# load the synthetic european union statistics on income & living conditions
data(eusilc)
# make all column names lowercase
names( eusilc ) <- tolower( names( eusilc ) )
# create poverty gap variable
eusilc <- transform( eusilc , gap_eqincome = ( 10000 - eqincome ) * ( eqincome <= 10000) )
# construct a survey.design
# using our recommended setup
des_eusilc <-
svydesign(
ids = ~ rb030 ,
strata = ~ db040 ,
weights = ~ rb050 ,
data = eusilc
)
# immediately run the convey_prep function on it
des_eusilc <- convey_prep( des_eusilc )
# FGT0
fgt0 <- svyfgt( ~eqincome , des_eusilc , abs_thresh = 10000 , g = 0 )
# FGT1 among the poor
fgt1 <- svyfgt( ~eqincome , subset( des_eusilc , eqincome <= 10000 ) , abs_thresh = 10000 , g = 1 )
# Gini of poverty gaps
gini <- svygini( ~gap_eqincome , des_eusilc )
# Sen-Shorrocks-Thon Index
sst <- svysst( ~eqincome , des_eusilc , abs_thresh = 10000 , components = TRUE )
sst.comp <- attr( sst , "components" )
# Decomposing the SST index
sst.comp$Components[1,][[1]] == fgt0[[1]]
sst.comp$Components[2,][[1]] == fgt1[[1]]
sst.comp$Components[3,][[1]] == gini[[1]]
# calculate from components
fgt0[[1]] * fgt1[[1]] * ( 1 + gini[[1]] ) == sst[[1]]
No sc It is easy to compute the se of fgt0+fgt1*(1+gini) usando a função contrastinf.
I tried to compute the variance using the function contrastinf using:
T1 = list(value=fgt0[[1]],lin=attr(fgt0,"lin")) T2 = list(value=fgt1[[1]],lin=attr(fgt1,"lin")) T3 = list(value=gini[[1]], lin= attr(gini, "lin")) list_all = list (T1=T1, T2= T2, T3=T3)
lin_SEN <- contrastinf(quote(T1*T3+T2*(1-T3)), list_all)
This should work because the composition of indices is very simple but it didn't, because the linearized variables for the gini, fgt0 and fgt1 have different lenghts. The gini is computed in a domain while the fgt0 and fgt1 are computed in the whole population.
@DjalmaPessoa , yes. This is exactly what I did inside the svysst, but using the subset.
you complete gini$lin with zeroes for the vector components in fgt0$lin and not in gini$lin?
Yes, I completed the linearized gini with 0.
gini <- NULL
gini$value <- coef(rval.gini)[[1]]
if ( length( attr( rval.gini, "lin" ) ) == length( fgt0$lin ) ) {
gini$lin <- attr( rval.gini, "lin" )
} else {
gini$lin <- 1 * ( fgt0$lin != 0 )
gini$lin[ gini$lin == 1 ] <- attr( rval.gini, "lin" )
}
This way, since gini$lin uses a subset of a (possible) subset used in fgt0, i can ensure that the lengths match. I think this makes sense. Right, @DjalmaPessoa?
Yes, but perhaps with this possibility of using a combination of indeces with contrastinf, we have to complete as it was some time ago, in each index.
For instance, the gini lin for a design subset would have the length of the whole sample size. This is done by library vardpoor when dealing with subsets. They multiply by an indicator vector and keep the original size o f vectors. Convey hinges on survey and uses design subset which change lin vector sizes.
We have to discuss about this.
1ec7d8e3c843cd7b2441ab5ff60fd5c4892af683 sets svysst as experimental
This kind of a niche function. It might be useful for examples, but nobody really uses it. I think it would be better to remove it.
please remove it from the package and the book?
Using an idea similar to that of https://github.com/DjalmaPessoa/convey/issues/212, if we could prove that FGT(0), FGT(1) and Gini (among the poor) estimates are correct, it is easy to show the SST index. This is the idea used in https://github.com/DjalmaPessoa/convey/blob/master/R/svysst.R#L168, and can be seen in this book. The variance, however, is a little bit more complicated, as it has to account for covariances. This decomposition is also done inside the
svysst
function and returned whencomponents = TRUE
.I can do this explicitly if you agree, @DjalmaPessoa and @ajdamico.