Colleen-ODonnell / KenyaPovertyTargetingModel

1 stars 0 forks source link

KenyaPovertyTargetingModel

To develop an economic poverty targeting model for refugees in Kenya, we based our models based on survey data from the UNHCR. We first renmae the variables in the dataset. Next, we create share of household variables based on the member dataset to convert individual level data to household level. For example, we create variables indicating the share of household members who were a specific age or who had a specific level of education. Next, we replace outliers with NAs so that we were still able to use those observations in our model. We define outliers as any observations over 3 SD from the mean. Next, we create dummy variables to incorporate assets into our model. We merge the household, food, nonfood, education, and energy datasets together.

We develop descriptive statistics on the total spend by category (total spend on household, food, education and nonfood). We merge our combined dataset with the member dataset. Next, we create a total spend variable by summing the spend by category. We create descriptive statistics on expenditure per capita, as well as the natural log of expenditure per capita.

In the Analysis file, we convert all of the variables to floats as this is necessary to run the models. We develop a restricted dataset which includes demographic variables and verifiable assets. We create descriptive statistics on the restricted dataset. We take the natural log of total spend divided by household size for our y variable. Our independent variable is the restricted set (for restricted models). We split the data into a training set (75% of the data) and test set (25% of the data) for the purpose of training and validating the model. In the ridge model, we use 10-fold cross validation to find the optimal alpha and fit the model. We report coefficients from the restricted model as well as the mean squared error. We repeat this process for the lasso and elastic net models on the restricted dataset. Next, we create the unrestricted dataset which includes demographic variables, verifiable assets, and non-verifiable assets. We have code removing unwanted variables (such as variables representing quantity or spend which should not be included as independent variables). In this case, our independent variable is the unrestricted set. We run the lasso, ridge, and elastic net models on the unrestricted dataset also reporting the variable coefficients.

In the next three files, we report the error rates of our model. We run the models and calculate inclusion and exclusion error. Our code is flexible to measure model accuracy at different percentiles. We compare the error rates amongst the unrestricted and restricted datasets to derive our conclusions.