erlris / intergen_ml

Project on intergenerational mobility as a prediction problem
1 stars 0 forks source link

Boosting residuals #8

Open jackblun opened 6 years ago

jackblun commented 6 years ago

I've done some analysis where first a rank-rank linear model is fitted, then other models are fitted to the residuals. Building it this way I think is quite natural.

The graph below shows the % improvement in r2 relative to a basic rank-rank model for a a variety of variables and models. For example, an elastic net model using the full set of variables gives on average a little over a 10% improvement in fit on the rank rank.

I think this approach makes sense - would be great to see how this would do in the norwegian data as everything is just super noisy in the UK data! I need to add more algorithms and tune ranger a bit.

screen shot 2018-07-24 at 22 24 23

jackblun commented 6 years ago

An updated version of the above. Error bars are 1 standard deviation. Its nice that for all models, we see on average a little over a 20% improvement in fit. The other graph shows how much of the 'total explainable variance' the income rank model does. Sits at around 85%. Think this is quite interpretable and possibly useful. screen shot 2018-07-25 at 17 59 55 screen shot 2018-07-25 at 18 01 58