[x] Under "first steps", we don't actually do the mapping from (row, col) to spatial coordinates (x, y), right? .transform() just lists the values necessary to do that mapping?
[x] fix population density data (run all the code)
Our model:
[x] add docstrings
[x] add y-intercept? how do we do this?
[x] TO DO THIS^: switch to gradient descent. Have a term of w that you don't use in your regulizer. Also if we switch to gradient descent then we can experiment with using the l1, l2, and no regulizers
[x] add interpretation of our model's coefficients (and y-intercept?)
[x] add text to linear regression with regulizers sections, interpret all outcomes
[x] did i implement y-intercept correctly?
[x] INITIAL MODEL: do the discrepancies between our results and scikit-learn's make sense? Discrepancies being differences in sign and magnitude of coefficients
[x] L1 MODEL: Why does the loss increase if lambda is large enough? Why none of coefficients in l1 case equal to 0? Because they use coordinate descent? Do they have a cutoff in magnitude to set coefficients equal to exactly zero? We need to set a cutoff, compare qualitatively our results (signs, same equal to zero) and that they predict similarly
[x] L2 MODEL: results are reallllyy similar to initial model, maybe change lambda? Possible that their results are slightly different because they may take the sum rather than the mean of the squared errors in their loss function... could divide lambda by n to match them
[x] discussion about which model was best amongst linear regression with no penalty, l1 penalty, and l2 penalty
[x] Try making predictions -- potential issue of multicollinearity?
[x] Then try dropping one column to deal with fact that all rows sum to 1
[x] decide what to do with RidgeRegression class
[x] poisson regression
[x] map residuals to motivate spatial lag regression?
[x] regulizer should be regularizer
[x] take best model, do cross-validation and calculate avg rmse to compare to spatial version
Notebook housekeeping:
Our model: