ajdamico / convey

variance of distribution measures estimation of survey data
GNU General Public License v3.0
17 stars 7 forks source link

for svyqsr and svylorenz, any guess why `networth` is so different from all of the other variables? #451

Closed ajdamico closed 8 months ago

ajdamico commented 8 months ago

https://guilhermejacob.github.io/context/4.3-quintile-share-ratio-svyqsr.html#real-world-examples-8

it's possible that i've coded something incorrectly..

guilhermejacob commented 8 months ago

I don't think you did, but I do think that networth is much more asymmetric than income. In fact, it even has negative values. One way to see that is to compare the svylorenz on networth (even if it has negative incomes). You will see that L(.90) looks relatively similar for CPS and PNADC, in the range of .50-.70. Now for CPS, it is somewhere in the .26. This would mean that the top 10% owns around ~84% of the total networth. Looks a bit extreme for income/wealth, but maybe not for networth?

ajdamico commented 8 months ago

could you say a bit more? the numbers are this section are 17, 13, 8, 6, 16, 12, -379, 23 ...and broken out by sex it's even more of an outlier. how can we explain the qsr of -2,904 ? is it possible for you to take a closer look at these two numbers? do we want this measure subsetted throughout to either >=0 or >0 with an explanation why?

guilhermejacob commented 8 months ago

We could remove extremes, but I wonder whether that makes sense. Are these informative outliers? I mean, are they the result of incorrect networth answers or are they just extreme values we expect to occur in a population?

To understand if networth is correct I'd ask if you can reproduce these results. They look quite different from ours.

If the networth variable is correctly constructed, it is seems that these extreme results are of the second kind. In practice, we could remove some extreme values to get more "reasonably-looking" results. However, this means we face two kinds of problems: (a) how do we define the outliers? and (b) how does the choice of the outlier treatment affects variance estimation? And this second question is even more tricky.

Maybe we could point the reader to the svylorenz results and show that the differences in the Lorenz curve ordinates makes sense.

ajdamico commented 8 months ago

i'll think about this more and get back to you! i'm sure networth is constructed correctly (ctrl+F for the text "this example" on this page) ..but yeah i need to look closer at how the outliers change the shape of the distribution. thanksss

ajdamico commented 8 months ago

https://www.federalreserve.gov/releases/z1/dataviz/dfa/distribute/chart/#quarter:136;series:Net%20worth;demographic:age;population:all;units:shares also

ajdamico commented 8 months ago

any edits to this revision? https://github.com/guilhermejacob/context/pull/36 thanks!