Milestone1 shot maps [first part]

salelkafrawy commented 2 years ago

The task has two parts (code and blog): for the code:

[x] Interactive plot that selects team, season and year.
[x] Works with the coordinates correctly (all shots/goals are one side of the rink).
[ ] Compute aggregate statistics of shot locations across the entire league to compute league averages.
[ ] Group shots by team, and use the league averages computed above to compute the excess shots per hour. You can choose to represent this as either a raw difference in goals between the teams, or a percentage.
[ ] Make appropriate choices to bin your data when displaying it. You could also consider using smoothing techniques to make your shot maps more readable. A common strategy is to use kernel density estimation with a Gaussian kernel.

for the blog:

[ ] Export to html.
[ ] Colorado Avalanche over time commentary.
[ ] Buffalo Sabres vs. Tampa Bay Lightning commentary.

The aesthetics are still 80% good.

TimkLee commented 2 years ago

Summary

visualizations appear to be incomplete for at least some team + subseason combinations

confirmed approximately expected # of files (1230*5) generated by cleaning scripts

Incomplete visualizations

The visualizations appear to be incomplete for at least some team + subseason combinations.

For example, checking Tampa Bay Lightning for the 2018 post-season, I only see ~50 shots displayed. But the data for the 4 games the team played that post season show >100 shots in total:
>>> df = pd.read_csv("./data/games/2018-postseason.csv")
>>> g1 = df[df.game_id==2018030111]
>>> g1[g1.shooter_team_name.str.contains('Lightning')].shape
(29, 30)  # 29 shots
>>> g4[g4.shooter_team_name.str.contains('Lightning')].shape
(33,30)
...
It is possible that only the first game from each series is displayed? I found a point for every shot I checked in game 1, but couldn't find any points corresponding to game 4:
>>> g1.loc[g1.shooter_team_name.str.contains('Lightning'),['coordinate_x','coordinate_y','shooter_name']]
     coordinate_x  coordinate_y       shooter_name
212          80.0          -8.0       Alex Killorn
218          78.0          13.0      Brayden Point
250          77.0         -10.0     Steven Stamkos
...
>>> g4 = df[df.game_id==2018030114]>>> g4.loc[g4.shooter_team_name.str.contains('Lightning'),['coordinate_x','coordinate_y','shooter_name']].sort_values('coordinate_x', ascending=False)
     coordinate_x  coordinate_y       shooter_name
624          84.0          -5.0    Anthony Cirelli
587          83.0          -6.0    Anthony Cirelli
622          83.0          -0.0      Brayden Point
606          83.0          -2.0    Cedric Paquette
...
References if useful: Lightning 2018 playoffs

Game 1 = 29 SOG

Game 2 = 24 SOG

Game 3 = 31 SOG

Game 4 = 33 SOG

I think the following code would make the team_df['shooter_team_name'] smaller. Do we need this part? team_df = team_df[team_df['home_team'] == selected_team]

    team_df = game_df[game_df['shooter_team_name'] == selected_team]
    #print(team_df['shooter_team_name'])
    team_df = team_df[team_df['home_team'] == selected_team]
    #print(team_df['shooter_team_name'])

salelkafrawy commented 2 years ago

Hey Team,

Yes that line was meant for debugging (when I was trying to see where each team started the game from)

team_df = team_df[team_df['home_team'] == selected_team]

so I removed it and the "incomplete visualization" issue should be solved now.

6758-Project / hockey

Milestone1 shot maps [first part] #13

Summary

Incomplete visualizations