Closed ld-archer closed 1 year ago
I have added to the Trello board (under information) a spreadsheet that outlines the calculation of EI from US variables and a methodology doc.
Here is the equation for calculating equivalent income:
=
Yc AD2*EXP((
Phy1 0*C2+
Phy2 (-0.116/1.282)*D2+
Phy3 (-0.135/1.282)*E2+
Phy4 (-0.479/1.282)*F2+
Phy5 (-0.837/1.282)*G2+
Men1 (0*H2+
Men2 (-0.14/1.282)*I2+
Men3 (-0.215/1.282)*J2+
Men4 (-0.656/1.282)*K2+
Men5 (-0.877/1.282)*L2)+
Lone1 (0*M2+
Lone2 (-0.186/1.282)*N2+
Lone3 (-0.591/1.282)*O2)+
Em1 (0*Q2+
Em2 (0.033/1.282)*R2+
Em3 (-0.283/1.282)*S2+
Em4 (-0.184/1.282)*T2+
Em5 (-0.755/1.282)*U2+
Em6 (-0.221/1.282)*V2)+
Hous1 (0*W2+
Hous2 (-0.235/1.282)*X2+
Hous3 (-0.696/1.282)*Y2)+
Safe1 (0*Z2+
Safe2 (-0.291/1.282)*AA2+
Safe3 (-0.599/1.282)*AB2)))
Yc = hh_income after outgoings and adjusting for hh_size
It was ripped directly from an excel spreadsheet hence the poor formatting. The term on the left of the tab is the variable and factor level (i.e. Hous2 is housing_quality == 2, which is 'Yes to some' household questions), and the term on the right is the weighting applied to that specific factor level ((-0.235/1.282)*X2
). Each of the excel elements (X2
) can only be a 1 or a zero, and only one of the factor levels can be above 0, so this equation generates an exponent term to modify the original disposable income based on weightings for each variable and factor level combination.
To implement this in the S7EquivalentIncome module, I've created dictionaries for each variable to hold the weights for each level and used them to generate the exponent term. Testing now...
Mean of equivalent income is much lower than household income, which is what we would expect to happen as any S7 variable that is not the 'best' outcome will cause a reduction.
Next step is to do the calculation on the 2018 dataset and see if I can recreate values from Chris' spreadsheet.
Have tested the EI calculation function in the pipeline on Chris' original spreadsheet and we return exactly what the spreadsheet contains, so the calculation is good. Now to run some scenarios and visualise outputs.
Giving this its own issue as it should be a standalone module.
This is a deterministic calculation based on a weighted combination of the SIPHER 7 variables. We currently have 4 of the 7 in the model, although our version of neighbourhood safety and housing quality are a little bit different from their exact definitions in SIPHER 7. I will create all the variables as they are defined in SIPHER 7 for the purpose of calculating equivalent income.
This issue needs #200 to be completed first (adding the variables), then the equivalent income module.