Adrianne-Fu / Orie-4741-Project

0 stars 0 forks source link

Final Peer Review - David df347 #13

Open Davidfdaf opened 2 years ago

Davidfdaf commented 2 years ago

This group was using economic, census data and previous and current house price data to project next months house prices at a county level in the us one the course of 10 years. This scope of this project large and impressive.

Good things: Good explanation for choosing l1 regularization as a model great data preprocessing, good front filling and back filling methodology, one hot encoding of variables great visualizations to gain understanding of dataset overall a very well written paper

Things to improve: I think you should think more about the time series nature of your problem.

First of all predicting prices is hard and often it is more desirable to predict returns or log returns as differencing is often mean stabilizing and the log function is often variance stabilizing which are important characteristics because we want to be forecasting a stationary distribution.

given that you are predicting prices over the course of 10 years (where later years housing prices are so much higher) I would not use l2 regularization as your model will prioritize correctly fitting your later data than earlier data.

Lastly I believe there is some information leakage going on. I can't say for certain because I can't look at your code on GitHub (a good practice for transparency) but this could be identified if you test each year separately and average the MAE. I suspect your error would be much higher. randomly splitting your time series data to then make predictions is a very bad practice that is prone to information leakage issues.

With that said you guys did a really good job on this, best!

Davidfdaf commented 2 years ago

Hey I just looked at your project again and realized your code is on the GitHub. I don't know how I missed it so please disregard that comment. Also I wanted to be sure you guys knew that I was very impressed by your work and gave you a very good grade. I really dug into the time series modeling nature because I have experience in this outside of the class so I thought I could provide constructive criticism for your learning benefit I did not deduct any points because you did not make those considerations. Best, David