DS4PS / cpp-526-spr-2021

Course shell for Foundations of Data Science I
https://ds4ps.org/cpp-526-spr-2021/
MIT License
1 stars 2 forks source link

Lab 4 - Team Trend Line and Legend #7

Open kpalmer7113 opened 3 years ago

kpalmer7113 commented 3 years ago

1) Team Trend Lines I added team trend lines to my graph but some of the older teams do not include all the data points.

For example, The New York Yankees' trend line starts around 1912 when it should start at 1900. The Detroit Tigers trend line begins around 1902-1911 with a gap between 1911-1915 and then completes the trend line from 1916 on. I hope this makes sense. Other older teams are not affected such as the Cincinnati Reds, the complete trend line is shown.

In the Lab 04 instructions, the graphs in the second image that show multiple teams are also showing the same trend lines I'm getting for the Yankees and Tigers. I did notice when reviewing the Teams dataset that some ave.so values are NA. I'm wondering if this could be affecting the data?

2) Legend I added the following code to my plot:

one.team <- filter(Teams, 
                   name == input$my.team) 

text ( x=1912, y=8.0, col="goldenrod1", labels=one.team, cex=0.8 )

When one.team is inputted in the labels argument the team name is repeated on the y-axis at y=8 throughout the whole x-axis. If I add a singular team name such as "San Diego Padres" then the team name is displayed correctly.

lecy commented 3 years ago

Check what your code produces:

my.team <- "some team name here"
one.team <- filter( Teams, name == my.team )
one.team

You are seeing an example of recycling. For functions that require vectors of equal length, when you give them unequal vectors sometimes they will recycle values to reconcile length.

You probably want a single instance of the name for the label. Try adding the unique() function?

lecy commented 3 years ago

Or you can just use the original label?

text ( x=1912, y=8.0, col="goldenrod1", labels=input$my.team, cex=0.8 )

I don't remember the exact structure of the data and didn't look back over the lab, so this example is not tested. But I think it addresses your problem.

kpalmer7113 commented 3 years ago

The labels=input$my.team did work!

lecy commented 3 years ago

Single value vs vector for the argument. I've made that mistake maybe 17,000 times before it clicked.

kpalmer7113 commented 3 years ago

Thank you! It makes sense, so I hope to remember this argument for future projects.

Any ideas on my issue with the team trend lines with some of the older teams? I can't seem to figure out why I am getting incomplete trends with teams such as the New York Yankees and Detroit Tigers.

lecy commented 3 years ago

Some teams only occur for a few years in the dataset because of changes to names or the franchise moved or closed. The Yankees is weird though - I feel like they have been around forever.

Always check your data for issues like this. Break open the app by manually selecting the team with problems.

my.team <- "new york yankees"  # replace with correct name 
one.team <- filter( Teams, name == my.team )
one.team

See what you are getting. Not sure if using teamID instead of name would help?

jamisoncrawford commented 3 years ago

@lecy thank you for jumping in to help, here!

@kpalmer7113 many, a lot of learners have thought that their dashboards don't work because they would deliberately choose fun but obscure team names. It turns out that many teams just never appear in the data, or they appear before the 1900s, or they have several missing years, or their name changed at some point.

Even the visualization from the New York Times ends at 2012 but the lahman package has data up to c. 2021 or so!

A wild example is the Milwaukee Brewers, which started as the Seattle Pilots, became the Brewers, but then became the St. Louis Browns, as well as the Baltimore Orioles. Wild, I say! And confusing!

kpalmer7113 commented 3 years ago

Thank you for the clarification!