6758-Project / hockey

0 stars 0 forks source link

Milestone1 shot maps [first part] #13

Closed salelkafrawy closed 2 years ago

salelkafrawy commented 2 years ago

The task has two parts (code and blog): for the code:

for the blog:

The aesthetics are still 80% good.

TimkLee commented 2 years ago

Summary

  • visualizations appear to be incomplete for at least some team + subseason combinations
  • confirmed approximately expected # of files (1230*5) generated by cleaning scripts

Incomplete visualizations

The visualizations appear to be incomplete for at least some team + subseason combinations.

For example, checking Tampa Bay Lightning for the 2018 post-season, I only see ~50 shots displayed. But the data for the 4 games the team played that post season show >100 shots in total:

>>> df = pd.read_csv("./data/games/2018-postseason.csv")
>>> g1 = df[df.game_id==2018030111]
>>> g1[g1.shooter_team_name.str.contains('Lightning')].shape
(29, 30)  # 29 shots
>>> g4[g4.shooter_team_name.str.contains('Lightning')].shape
(33,30)
...

It is possible that only the first game from each series is displayed? I found a point for every shot I checked in game 1, but couldn't find any points corresponding to game 4:

>>> g1.loc[g1.shooter_team_name.str.contains('Lightning'),['coordinate_x','coordinate_y','shooter_name']]
     coordinate_x  coordinate_y       shooter_name
212          80.0          -8.0       Alex Killorn
218          78.0          13.0      Brayden Point
250          77.0         -10.0     Steven Stamkos
...
>>> g4 = df[df.game_id==2018030114]>>> g4.loc[g4.shooter_team_name.str.contains('Lightning'),['coordinate_x','coordinate_y','shooter_name']].sort_values('coordinate_x', ascending=False)
     coordinate_x  coordinate_y       shooter_name
624          84.0          -5.0    Anthony Cirelli
587          83.0          -6.0    Anthony Cirelli
622          83.0          -0.0      Brayden Point
606          83.0          -2.0    Cedric Paquette
...

References if useful: Lightning 2018 playoffs

I think the following code would make the team_df['shooter_team_name'] smaller. Do we need this part? team_df = team_df[team_df['home_team'] == selected_team]

    team_df = game_df[game_df['shooter_team_name'] == selected_team]
    #print(team_df['shooter_team_name'])
    team_df = team_df[team_df['home_team'] == selected_team]
    #print(team_df['shooter_team_name'])
salelkafrawy commented 2 years ago

Hey Team,

Yes that line was meant for debugging (when I was trying to see where each team started the game from)

team_df = team_df[team_df['home_team'] == selected_team]

so I removed it and the "incomplete visualization" issue should be solved now.