DS4PS / cpp-528-spr-2020

Course shell for CPP 528 Foundations of Data Science III for Spring 2020.
http://ds4ps.org/cpp-528-spr-2020/
2 stars 0 forks source link

LAB 06 #22

Open sunaynagoel opened 4 years ago

sunaynagoel commented 4 years ago

Lab 06 asks for using baseline model from Lab 05 as a starting point for lab 06. Is it referring to the model I created by finally selecting 3 meaningful variables and with metro fixed effect ?

Note : I have already created a combined census and tax policy dataset.

Thanks

lecy commented 4 years ago

Correct, the baseline model is the model where you are predicting changes to MHV based on census tract and metro characteristics.

Once you have a model that is performing well (meaningful variables, decent fit) you can add the policy variables to see if they explain the outcome any better than community characteristics alone.

Typically you would spend a lot of time reading other studies and testing lots of models to make sure you are including all of the important variables, but since this class is about project management and not the modeling component specifically you will not be graded on a perfect final model.

More so, the exercise is meant to demonstrate the typical life-cycle of a project:

Each step has it's own challenges, and you will never feel 100% confident that any step is 100% accurate, so we need a process that makes all of the steps transparent, and also isolates each step so that you can update code for one step and the rest of your code will still run (fix a data step, which updates your rodeo dataset, which then updates models).

It will often be the case that you are not the domain expert on the topic, so someone will be reviewing and revising your baseline models. You just need to document the data and package the model code so it is easy for her to focus on the specific part of the project where she can add the most value as an expert. So having the models in draft form is fine.

The goal is to make your steps transparent and the data and code well-organized. I hope that helps give some context to your question and emphasizes the point that the big picture of project management is more important than all of the details of your model for this project.

castower commented 4 years ago

Hello @cenuno @lecy,

I wanted to clarify if we should be using 2000 to 2010 data or 1990 to 2000 data for the labs. I've reviewed back over my submissions and realized some of them have used 1990 to 2000 and some have used 2000 to 2010. For the final project, which time period should I use?

Thanks! Courtney

cenuno commented 4 years ago

Hi @castower,

You are correct: for Lab 05 and Lab 06, you should be using 2000 to 2010 data. The final project uses both the 1990 to 2000 and the 2000 to 2010 time periods (since it is assembling all labs you have done for this course).

Respectfully,

Cristian

castower commented 4 years ago

Thank you @cenuno!

lecy commented 4 years ago

For context, the tutorial on descriptive analysis walks you through 2000-2010 data, so the 1990-2000 period was used so that you could replicate steps without having to write all of the code yourself, but familiarize yourself with the data and explore distinct patterns in a different time period.

The exercise was also meant to introduce you to the quizzical observation that home price valuation has changed drastically between the two decades. The period 1960 to 2000 was an era in the US where people were fleeing central cities because of a crime epidemic, social upheaval, failing public institutions (especially schools), and race riots. Development of owner-occupied residential units were stagnant, and home values took a hit.

But like a lot of systems, the balancing loop has kicked in and the cost-benefit of cities has tipped in their favor. Without mass transit commuting costs grow exponentially with rise in metro populations. We are just starting to understand the incredible financial, health, and social strains caused by long commute times. At the same time proximity and density have become increasingly important as a result of the super-charged and fast-paced creative economy. Younger generations are getting married later and starting families later, leading to a larger demand for work-live-play environments. Cities that have been losing tax base to suburbs have started realizing that transit corridors and innovation districts can crowd in high density development and high-income residents, creating fiscally sustainable development.

It's a quite fascinating transition that is visible in the data in many ways.