Open seansodha opened 7 years ago
Thank you for your review! We really need to change the way we draw figure 1, maybe use a pairwise correlation plot rather than scatter plot. Just a little clarification, we are not assigning dummy variables to 1-10. Instead, we are transforming categorical variable that has 10 categories to 10 dummy variables. Maybe the way we write ZipCode[1,...,10] leads to your misunderstanding, and we will definitely improve that.
The purpose of this project is to predict the full market value of different lots in the Manhattan area. I can certainly see where this project is useful since it can be used as estimators for real estate companies to use. I think data like this would be extremely useful since they will be using the PLUTO dataset. I think the group has a very comprehensive understanding of all of the variables involved in the project. I like how the group was very thorough in their data cleaning. The group seems to have a good understanding of where the project is heading
There were a few things that concerned me in this project however. Figure 1 is way too small and very hard to read or even understand what is happening. Assigning dummy variables like 1->10 or 1->6 is dangerous because that implies that a house with an assignment of 2 is twice in value of a house with an assignment of 1. Finally, I was not able to understand if the group is going to be able to tell if their model would underfit or not.