Summary

This project investigated the relationship between GDP and personal income. He found that the average income of a state or region can be predicted by the average personal income of that state or region. To move forward he would like to collect data about the population of the areas.

Data Preparation

The tables that have been developed are "Average GDP by State", "Average GDP by Region", "Average Personal Income by State", "Average Personal Income by Region", "Average GDP by State Industry", "Average GDP by Regional Industry", "Average GDP and Personal Income by State", and "Average GDP and Personal Income by Region". These tables essentially are either the average GDP or average personal income, and then they are split up into to different areas like state, region, state industry, and regional industry. Although the main tables that are used are the last two. The data is tidy, it follows all the principles of tidy data. The data is also cleaned well, and easy to understand.

Modeling

He tried multiple models, but he ended up using average GDP vs average income with the average GDP predicting the average income. He accurately describes and interprets his models.

Validation

He used cross-validation for both of his models, and took into account how big the spit should based on the amount of data which is good. He has a good summary of the validation at the end, but there could have been a line after each cross-validation to explain if that specific model had good results.

R Proficiency

Overall the code is easy to read and understand. There are helpful comments throughout the code to help you see exactly what he is doing. One thing that he could add is something to suppress some of the warnings from printing out. This is minor though.

Communication

make table printouts look better Overall it is very easy to follow. The wording is easy to understand and the visualizations add more context to what he is saying. One thing that could be changed is making the table printouts into the kind where you can interact with them. Another minor thing is adding custom labels to the axis of graphs.

Critical Thinking

It is clear that he has put thought into how we could move forward with his findings. He has ideas on how he could collect more data, and what kind of data to collect. He has also found a way that the findings could be applied and some unintended consequences that could come out of it.

Data Preparation and Modeling (20% out of 20%)

Tables follow clean and tidy guidelines. Work done in parts 1 and 2 present data and models in a way that is easy to understand.

Validation and Operationalization (19% out of 20%)

Validation methods are satisfactory and present findings in a clear way. A discussion is presented of the implications of operationalizing our findings and unintended consequences that could be procured.

R Proficiency (20% out of 20%)

R code runs without any error and does not use any iterative approaches. Comments are utilized so the audience can follow along even if they have little or no programming knowledge.

Communication (19% out of 20%)

I believe the communication flowed from deliverable to deliverable and is easy to follow. Improvements could be made regarding the output of data tables to be interactive. Suppressing error and warning messages would also improve the readability of this portfolio.

Critical Thinking (20% out of 20%)

Critical analysis was used to articulate findings and their implications.

VioletInferno / DataSciencePortfolio

Final Review #2