ajdamico / convey

variance of distribution measures estimation of survey data
GNU General Public License v3.0
17 stars 7 forks source link

create a replication vignette for svysst #216

Closed ajdamico closed 3 years ago

guilhermejacob commented 7 years ago

Using an idea similar to that of https://github.com/DjalmaPessoa/convey/issues/212, if we could prove that FGT(0), FGT(1) and Gini (among the poor) estimates are correct, it is easy to show the SST index. This is the idea used in https://github.com/DjalmaPessoa/convey/blob/master/R/svysst.R#L168, and can be seen in this book. The variance, however, is a little bit more complicated, as it has to account for covariances. This decomposition is also done inside the svysst function and returned when components = TRUE.

I can do this explicitly if you agree, @DjalmaPessoa and @ajdamico.

ajdamico commented 7 years ago

for both this and #212, are there published variance estimates anywhere that we could match? otherwise the ?help and contextbook should both probably mention that they are your own calculation.. sorry if i'm misunderstanding something

guilhermejacob commented 7 years ago

You can try with the datasets here. But I don't think they will match. The Gini index is slightly different.

Take a look:

hh.data <- haven::read_stata("http://web.cas.suffolk.edu/faculty/jhaughton/hh.dta")
hh.data <- data.frame( hh.data )
hh.data <- transform( hh.data , weighti = weight * famsize )

pce.data <- haven::read_stata("http://web.cas.suffolk.edu/faculty/jhaughton/consume.dta")
pce.data <- data.frame( pce.data )

final <- merge( hh.data , pce.data , by = c( "hhcode" ))

library(survey)
#srs.des <- svydesign(ids = ~1, data = hh.data )
srs.des <- svydesign( ids = ~thana , strata = ~region , weights = ~weighti , data = final )
library(convey)
srs.des <- convey_prep( srs.des )

svyfgt( ~pcexp , srs.des , abs_thresh = 3000 , g = 0 , na.rm = TRUE ) # matches link below
svygini( ~pcexp , subset( srs.des, pcexp > 0 ) , na.rm = TRUE ) # doesn't match link below

You can check some results published in the book here. For now, I'll proceed with the decomposition-based demonstration. Ok, @ajdamico ?

ajdamico commented 7 years ago

hi, the more you can explain about why they don't match, the better.. thanks

guilhermejacob commented 7 years ago

This shows how SST index can be decomposed:

# load the convey package
library(convey)

# load the survey library
library(survey)

# load the vardpoor library
library(vardpoor)

# load the synthetic european union statistics on income & living conditions
data(eusilc)

# make all column names lowercase
names( eusilc ) <- tolower( names( eusilc ) )

# create poverty gap variable
eusilc <- transform( eusilc , gap_eqincome = ( 10000 - eqincome ) * ( eqincome <= 10000) )

# construct a survey.design
# using our recommended setup
des_eusilc <- 
  svydesign( 
    ids = ~ rb030 , 
    strata = ~ db040 ,  
    weights = ~ rb050 , 
    data = eusilc
  )

# immediately run the convey_prep function on it
des_eusilc <- convey_prep( des_eusilc )

# FGT0
fgt0 <- svyfgt( ~eqincome , des_eusilc , abs_thresh = 10000 , g = 0 )

# FGT1 among the poor
fgt1 <- svyfgt( ~eqincome , subset( des_eusilc , eqincome <= 10000 ) , abs_thresh = 10000 , g = 1 )

# Gini of poverty gaps
gini <- svygini( ~gap_eqincome , des_eusilc )

# Sen-Shorrocks-Thon Index
sst <- svysst( ~eqincome , des_eusilc , abs_thresh = 10000 , components = TRUE )
sst.comp <- attr( sst , "components" )

# Decomposing the SST index
sst.comp$Components[1,][[1]] == fgt0[[1]]
sst.comp$Components[2,][[1]] == fgt1[[1]]
sst.comp$Components[3,][[1]] == gini[[1]]

# calculate from components
fgt0[[1]] * fgt1[[1]] * ( 1 + gini[[1]] ) == sst[[1]]
DjalmaPessoa commented 7 years ago

No sc It is easy to compute the se of fgt0+fgt1*(1+gini) usando a função contrastinf.

DjalmaPessoa commented 7 years ago

I tried to compute the variance using the function contrastinf using:

T1 = list(value=fgt0[[1]],lin=attr(fgt0,"lin")) T2 = list(value=fgt1[[1]],lin=attr(fgt1,"lin")) T3 = list(value=gini[[1]], lin= attr(gini, "lin")) list_all = list (T1=T1, T2= T2, T3=T3)

lin_SEN <- contrastinf(quote(T1*T3+T2*(1-T3)),    list_all)

This should work because the composition of indices is very simple but it didn't, because the linearized variables for the gini, fgt0 and fgt1 have different lenghts. The gini is computed in a domain while the fgt0 and fgt1 are computed in the whole population.

guilhermejacob commented 7 years ago

@DjalmaPessoa , yes. This is exactly what I did inside the svysst, but using the subset.

DjalmaPessoa commented 7 years ago

you complete gini$lin with zeroes for the vector components in fgt0$lin and not in gini$lin?

guilhermejacob commented 7 years ago

Yes, I completed the linearized gini with 0.

gini <- NULL
    gini$value <- coef(rval.gini)[[1]]
    if ( length( attr( rval.gini, "lin" ) ) == length( fgt0$lin ) ) {
      gini$lin <- attr( rval.gini, "lin" )
    } else {
      gini$lin <- 1 * ( fgt0$lin != 0 )
      gini$lin[ gini$lin == 1 ] <- attr( rval.gini, "lin" )
    }

This way, since gini$lin uses a subset of a (possible) subset used in fgt0, i can ensure that the lengths match. I think this makes sense. Right, @DjalmaPessoa?

DjalmaPessoa commented 7 years ago

Yes, but perhaps with this possibility of using a combination of indeces with contrastinf, we have to complete as it was some time ago, in each index.
For instance, the gini lin for a design subset would have the length of the whole sample size. This is done by library vardpoor when dealing with subsets. They multiply by an indicator vector and keep the original size o f vectors. Convey hinges on survey and uses design subset which change lin vector sizes. We have to discuss about this.

guilhermejacob commented 7 years ago

1ec7d8e3c843cd7b2441ab5ff60fd5c4892af683 sets svysst as experimental

guilhermejacob commented 3 years ago

This kind of a niche function. It might be useful for examples, but nobody really uses it. I think it would be better to remove it.

ajdamico commented 3 years ago

please remove it from the package and the book?