ajdamico / convey

variance of distribution measures estimation of survey data
GNU General Public License v3.0
17 stars 7 forks source link

any guess why `svyqsr` and `svygei` are so far off from census? `svyatk` is closer and `svygini` matches exactly.. #438

Closed ajdamico closed 8 months ago

ajdamico commented 10 months ago

looking at 2022 row of both tabs of https://www2.census.gov/programs-surveys/demo/tables/p60/279/tableA4.xlsx

after loading cps_household_design with https://guilhermejacob.github.io/context/1.4-the-current-population-survey-cps.html#the-current-population-survey-cps

Gini index of income inequality: 0.488

> svygini( ~ htotval , cps_household_design )
           gini    SE
htotval 0.48846 0.002

Household income ratios at selected percentiles

90th/10th: 12.63

> svyqsr(~htotval,cps_household_design,alpha1=0.1,alpha2=0.9)
           qsr     SE
htotval 44.668 0.9634

90th/50th: 2.90

> svyqsr(~htotval,cps_household_design,alpha1=0.5,alpha2=0.9)
           qsr     SE
htotval 2.0199 0.0237

50th/10th: 4.36

> svyqsr(~htotval,cps_household_design,alpha1=0.5,alpha2=0.1)
       qsr     SE
htotval 5.6891 0.0361

Mean logarithmic deviation of income: 0.637

> svygei( ~ htotval , subset( cps_household_design , htotval > 0 ) , epsilon = 0 )
           gei     SE
htotval 0.4849 0.0047

Theil: 0.440

> svygei( ~ htotval , subset( cps_household_design , htotval > 0 ) )
           gei     SE
htotval 0.4252 0.0052

Atkinson

e=0.25: 0.106

> svyatk( ~ htotval , subset( cps_household_design , htotval > 0 ) , epsilon = 0.25 )
           gini    SE
htotval 0.10149 0.001

e=0.50: 0.207

> svyatk( ~ htotval , subset( cps_household_design , htotval > 0 ) , epsilon = 0.5 )
           gini     SE
htotval 0.19553 0.0017

e=0.75: 0.315

> svyatk( ~ htotval , subset( cps_household_design , htotval > 0 ) , epsilon = 0.75 )
           gini     SE
htotval 0.28636 0.0021
guilhermejacob commented 10 months ago

I think that those in the table are quantiles, not quantile shares (like in svyqsr and svylorenz). About the Atkinson measures, maybe a different variable or different treatment of zero incomes?

ajdamico commented 10 months ago

i'm not sure i'm following what the calculation would be.. even if it doesn't match, could you show me code that gets a bit closer to the 12.63, 2.9, 4.36, and 0.637?

i agree atkinson and theil might be because of 1.5% of households have zero income and might be treated differently. i'm hoping to e-mail the author of this table not necessarily to match but just to confirm why we'd be close but not exact..

ajdamico commented 9 months ago

after you and i get code a little closer to their published numbers, i will probably want to compose an e-mail to matthew.unrath@census.gov & melissa.kollar@census.gov

guilhermejacob commented 9 months ago

Once you get that merge https://github.com/ajdamico/convey/pull/440, do this:

Household income ratios at selected percentiles

Some of the remaining differences might be explained by differences in the estimators used.

90th/10th: 12.63

> ( q1 <- svyquantile(~htotval,cps_household_design,quantiles=c(.1,.9), na.rm = TRUE ) )
$htotval
    quantile ci.2.5 ci.97.5        se
0.1    17064  16551   17653  278.9879
0.9   216177 212811  219120 1597.2183

attr(,"hasci")
[1] TRUE
attr(,"class")
[1] "newsvyquantile"
> coef( q1 )[[2]] / coef( q1 )[[1]]
[1] 12.6686

90th/50th: 2.90

> ( q2 <- svyquantile(~htotval,cps_household_design,quantiles=c(.5,.9), na.rm = TRUE ) )
$htotval
    quantile ci.2.5 ci.97.5        se
0.5    74282  73250   75018  447.5958
0.9   216177 212811  219120 1597.2183

attr(,"hasci")
[1] TRUE
attr(,"class")
[1] "newsvyquantile"
> coef( q2 )[[2]] / coef( q2 )[[1]]
[1] 2.910221

50th/10th: 4.36

> ( q3 <- svyquantile(~htotval,cps_household_design,quantiles=c(.1,.5), na.rm = TRUE ) )
$htotval
    quantile ci.2.5 ci.97.5       se
0.1    17064  16551   17653 278.9879
0.5    74282  73250   75018 447.5958

attr(,"hasci")
[1] TRUE
attr(,"class")
[1] "newsvyquantile"
> coef( q3 )[[2]] / coef( q3 )[[1]]
[1] 4.353141

Mean logarithmic deviation of income: 0.637

fixing zero incomes:

cps_household_design <- subset(cps_design , a_exprrp %in% 1:2)
cps_household_design <- update( cps_household_design , htotval = ifelse( htotval < 0 , NA , htotval ) )
cps_household_design <- update( cps_household_design , htotval.fix = ifelse( htotval == 0 , 1 , htotval ) )
> svygei( ~ htotval.fix , cps_household_design , epsilon = 0 , na.rm = TRUE )
                gei     SE
htotval.fix 0.63457 0.0084

Theil: 0.440

> svygei( ~ htotval.fix , cps_household_design , na.rm = TRUE )
                gei     SE
htotval.fix 0.44013 0.0051

Atkinson

e=0.25: 0.106

> svyatk( ~ htotval.fix , cps_household_design , epsilon = 0.25 , na.rm = TRUE )
            atkinson    SE
htotval.fix  0.10595 0.001

e=0.50: 0.207

> svyatk( ~ htotval.fix , cps_household_design , epsilon = 0.5 , na.rm = TRUE )
            atkinson     SE
htotval.fix  0.20737 0.0017

e=0.75: 0.315

> svyatk( ~ htotval.fix , cps_household_design , epsilon = 0.75 , na.rm = TRUE )
            atkinson     SE
htotval.fix  0.31516 0.0024
ajdamico commented 9 months ago

@guilhermejacob this is so awesome! please assign this issue to me as soon as you've built the book and confirmed your pull request also looks ok to changes there? thankssss