Week 4 - Tidy Tuesday Commentary

I was drawn to Kyle Cuilla's take on this week's tidy tuesday assignment. His visualization has a lot going on, so I've chosen to just focus on his visualization.

Tweet Link: (https://twitter.com/kc_analytics/status/1315754152723701760/photo/1) Code Link: (https://github.com/kcuilla/Tidy-Tuesday/blob/main/2020_41/2020_41_NCAA_Tourney.R)

Code/Tools/ approaches we have seen in class that you saw used over the week:

His code looked very similar to a lot of what we learned last week. He of course used the pipe (%>%), I found numerous uses of mutate(), group_by(), summarize(), filter(), arrange(), and others as well.

Code/Tools/ approaches that you enjoyed or that surprised you that we have not seen in class

One thing I noticed right away with this dataset is how there were values in some columns like "tourney_finish" that were character values like "Champ", "1st", "2nd". I figured this would have to be sorted out if we wanted to analyze it, and I was intrigued by how he did this. I learned about a new argument called _casewhen which sits inside of mutate() that lets you do multiple if_else statements on a vector. For example, he would use it to say "if tourney_finish is equal to "Champ", then replace that with "Champion". He did this for all the other values in this variable. It seemed like a very neat was to rename entire vectors of data.

Data visualizations (figures) that you enjoyed

I liked a few things about this visualization.

1: Anytime you have so many different factors (like 20+ schools) it can be very easy to get lost or overwhelmed. By using a subset of teams, this allowed for the colour visualization to be clear and consistent. I found this easier to interpret.

2: I liked the minimalist approach to the graph elements.

Data Visualization (Figures) that could be improved (and how you would improve them)

Some things I would change:

1: Though I was drawn to this visualization, it does have a lot going on. I might choose to display only a few of the figures, so that it is more focused. For example, the bottom right-most figure i feel is a bit clustered, and i'm not sure that it really adds anything to these figures.

2: I'm would not have gone with the off-white tint that is present here (though it's possible this was accidental? If you've ever used the snipping tool on windows while you had a blue-light filter on, this is often the result).

3: With the line figure, I think I would have also added a few more years to the x axis labels (e.g. 1990, 1995, 2000). I would also have put the x axis labels at the bottom, as I feel that is where people will more naturally look for x axis labels (and this is usually what I see in scientific papers).

Side note I viewed many other submissions as well, and found a great resource. I saw a few people talked about Cedric Scherer's visualization. I looked at this as well and when I went to his twitter page I found other visualization he had done, as well as this thorough guide to using ggplot.

"A ggplot2 Tutorial for Beautiful Plotting in R" (https://cedricscherer.netlify.app/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/)

His visualizations tend to be more on the artsy side of things, but sometimes this is the most compelling way to display the data. For example, i was blown away by this figure that he made:

https://pbs.twimg.com/media/EhfhullWsAMoetO?format=jpg&name=4096x4096

UM-R-for-EnvSci-Registered-Student / wk04-Tidytuesday-commentary

Week 4 - Tidy Tuesday Commentary #30