Taudin / DSProject

Portfolio Project for CSCI 385
0 stars 0 forks source link

Final Project #11

Open bnleff opened 4 years ago

bnleff commented 4 years ago

Summary

This project investigated census data in order to predict if a family would make 50,000 dollars or more based on variables such as education, occupation, marital status, and race. A model was successfully made in which 84% of the dependent variables were successfully predicted by the independent variables. The next planned step would be to discuss the model in more detail in the context of the data itself.

Data Preparation

The data preparation in this project is phenomenal. There's an abundance of conversions and uses of factors in order to make the data tidy. The data is properly cleaned and implemented correctly.

Modeling

The model used in this project is using logistic regression to predict whether a family is going to earn more or less than 50k a year based on certain variables such as education and marital status. The portfolio accurately describes the purpose of the model as well. The one enhancement I suggest is to communicate the model less analytically in some points. The project is above and beyond in the analytical aspect, but I want to see more analysis of the interpretation of the model.

Validation

Again, this project's validation is phenomenal. There are several ways this is used by doing cross-validation, performance tests, significance tests, likelihood ratio tests, and much more. The data and model are analyzed and checked perfectly.

R Proficiency

The R proficiency of this project is far above what I would expect from an intro to Data Science class. There is a vast demonstration of knowledge in functions, tests, and data prep. I really enjoyed the usage of writing functions.

Communication

The portfolio while it thrives in analytics, I find it hard to understand in some sense. There are high-level stats tests and tons of visualizations. Like I mentioned earlier, I would like to see more explanations about the model itself and also summaries at the end of each portfolio. I also did not find anything about social implications.

Critical Thinking

I did not find a spot in the project discussing the social implications of the model or the topic in general. I would love to have a summary at the end of p03 introducing this topic and wrapping up the project, but p03 ends abruptly after analyzing p-values.

bnleff commented 4 years ago

I just had an extra idea of what could help. Since your projects are really long, a table of contents to easily navigate through the portfolios would be a nice touch.

Taudin commented 4 years ago

Data Preparation and Modeling (18)

I spent a lot of time figuring out different ways to prepare my data for this project referencing some books I had been studying for additional ideas and help.

Validation and Operationalization (15)

I would give my self this score since I do a minimal job in the operationalization respect on this project and would perhaps give less as I admittedly should have spent more time in this effort.

R Proficiency (19)

I have spent most of my free time practicing, solving small problems and learning how to get better at using R over the last year, lol!

Communication (15)

I have yet to provide a "story" for this project and the bulk of my discussion within the project is technical and rather dry. I would need to improve the flow of the communication to improve this score.

Critical Thinking (17)

I select variables and conduct analysis on many variables using critical thinking I've learned in my statistics studies. There are many other implications I could discuss in depth that relate to ethical and social issues that I barely touch.