chuckf333 / DataScienceProject

0 stars 0 forks source link

Final Peer Review #9

Open bwalker20 opened 4 years ago

bwalker20 commented 4 years ago

Summary

This project covers baseball statistics of players from the year 1871 through 2018. I cannot tell exactly what the research questions are, so I don't know exactly what about baseball the project is trying to find, I'm assuming just interesting data correlations. One insight found was that the average salary has risen over time fairly significantly. I was hoping to really figure out the conclusions in the operationalization part, but it was missing. I am not sure what the next planned step is.

Data Preparation

The tables developed are the salaries of players, batting stats of players, average salaries per year, average home runs per year, and a teams batting stats per year. The portfolio does demonstrate tidy organization. The project also demonstrates cleaned data.

Modeling

One predictive model built was to try to predict the average salary of a team, the predictors were: Hits, Runs, Doubles, Triples, Home Runs. The portfolio somewhat describes the purpose of the model but I don't think its describing it specific enough. It doesn't explain exactly why the model is going to try to predict salaries. The portfolio somewhat describes the model, there could be more detail about the results.

There is a second model that attempts to predict the years a players has played. The predictors were: average hits, average games, average RBI, average stolen bases. This model also could use some more explaining, why it is being made and also the results.

Validation

The model for years played has been cross-validated, but not the model for salaries. The results of cross-validation were explained generally before cross-validating but not really applied to the project.

R Proficiency

The R code looks good, I cannot find anything wrong with the code. The only problem I could point out would be the code block to bind the web scraped data. rbind lets you bind everything at the same time so that would be the only change. Functional programming methods were good, no loops were used.

Communication

The communication is lacking a bit I think. The project does a good job describing what the code is doing, but doesn't really explain the results. The visualizations were good, but some of them could be explained more, like the ones in the model section.

Critical Thinking

There was not any operationalization done in this project. There weren't any implications mentioned in the project either.

chuckf333 commented 4 years ago

Data Preparation and Modeling (20%)

I think I had a good grasp on how to bring in the data and how to handle cleaning it up to be useful and readable.

Validation and Operationalization (10%)

I mostly understood how to use all of the functions and whatnot, but I just couldn't seem to get a full unsterstanding of how to actually interpret some of it.

R Proficiency (20%)

I'm definitely not the best at R, as it is not very intuitive for me, but I figured out how to make it work for me, like using the sqldf package since I am much better at using SQL to parse/sort tables. I think I did well in this section.

Communication (15%)

I tried to be as explicit as possible wherever I could, but I can't very well explain everything if I don't fully understand everything. I still think I did a pretty good job communicating what was going on throughout the three sections though.

Critical Thinking (10%)

I think I had some good ideas, but I just couldn't figure out where to go with them.