add Sen poverty index - Githubissues

guilhermejacob commented 7 years ago

Sen (1976) poverty index is a composition of FGT(0), FGT(1) and the Gini index of incomes below poverty line, as page 8 here. It was one of the first poverty indices to take inequality among the poor into account.

Also see: Chapter 6, p. 409 for a linearization approach.

guilhermejacob commented 7 years ago

@DjalmaPessoa , I know one of Deville rules allows for a composition of functions. This would allow us to produce a linearization of the Gini index for incomes of the poor using mean or median based poverty lines. Can you help me with this?

DjalmaPessoa commented 7 years ago

I believe it is possible to get it using the function contrastinf.R from convey. We just need to form 3 lists: list_gini list_pov list_poormean each list must have 2 components: the estimate and the linearized variable. We already have theses lists for the Gini and the poverty rate. For the poormean, it should not be hard to get it. Using contrastinf.R it generates a list with the value of the index and the linearized variable. Threre are derivative computations that are made by contrastinf.R directly. It uses the deriv function R to compute formal derivatives.

What would be the function arguments? We need to define a threshhold? Somethinh like?

svysen <- function(formula, design, thresh){

}

I believe I'll be able to do it.

guilhermejacob commented 7 years ago

@DjalmaPessoa , yes, we have to set poverty thresholds.

While for absolute thresholds the calculation is very straightforward, this doesn't seem to be the case for relative thresholds. I think we have to account for its variability inside the Gini index for incomes of the poor.

In my opinion, the call should look similar to svyfgt. Something like:

svysen <-   function(formula, design, type_thresh="abs",  abs_thresh=NULL, percent = .60, 
quantiles = .50, na.rm = FALSE, thresh = FALSE, ...){

}

Verma & Betti (2011) present a formula for this measure with relative thresholds, but it is not clear to me. I can't find the o() function they refer in the linearization formula. Can you help me understand what is going on there?

DjalmaPessoa commented 7 years ago

I don't know how to do it for relative thresholds, and even for fixed poverty line it is not straightforward to me.

DjalmaPessoa commented 7 years ago

The o() term in math means the order of approximation when the remaing terms are discarded. Just work with the first part and the rest is such that o(1/n) (little o) goes to zero faster than 1/n when n goes to infinity. So if your sample is big the approximation is ok.

guilhermejacob commented 7 years ago

Thank you! I'd never figure this out on my own. I'll try this today.

guilhermejacob commented 7 years ago

I'm not really sure about the paper approach. I'd rather keep things simple and only allow for absolute thresholds, using contrastinf. This way, we ensure that the results of the package are self-consistent. Testing would be much simpler and I can work on decomposition methods, analyzing its variations related to inequality, mean income shortfall among the poor and poverty spread. I'll follow the same ideia for SST index.

If we develop something later, i'll add it.

DjalmaPessoa commented 7 years ago

There is already a svysen in convey. Is it a different measure. What name do you suggest for this new sen?

DjalmaPessoa commented 7 years ago

You're really very fast. I just noticed that the svysen is the function you mentioned. Is it working fine? How do you compare the se estimates from linearized and replicated designs? Excellent job!

guilhermejacob commented 7 years ago

@DjalmaPessoa , thanks!

Well, it fares relatively well. I did some tests on eusilc. With a poverty line of 10000, the linearized cv is 3% while the rep cv is 1.5%. Considering that the coefs themselves are very small, I don't think it is a major problem. With higher thresholds, the ratio between the cvs is almost the same, but the difference is much smaller: 0,2%.

You can also decompose it in fgt0, fgt1 and gini of poor incomes using components = TRUE, which also brings variance-covariance matrix for them.

I'm still analyzing it.

guilhermejacob commented 7 years ago

There was an error in replicate based formula, but now they're fixed. Results are much closer.

guilhermejacob commented 7 years ago

see svysen

ajdamico / convey

add Sen poverty index #206