Closed salelkafrawy closed 2 years ago
Summary
- visualizations appear to be incomplete for at least some
team
+subseason
combinations- confirmed approximately expected # of files (1230*5) generated by cleaning scripts
Incomplete visualizations
The visualizations appear to be incomplete for at least some
team
+subseason
combinations.For example, checking Tampa Bay Lightning for the 2018 post-season, I only see ~50 shots displayed. But the data for the 4 games the team played that post season show >100 shots in total:
>>> df = pd.read_csv("./data/games/2018-postseason.csv") >>> g1 = df[df.game_id==2018030111] >>> g1[g1.shooter_team_name.str.contains('Lightning')].shape (29, 30) # 29 shots >>> g4[g4.shooter_team_name.str.contains('Lightning')].shape (33,30) ...
It is possible that only the first game from each series is displayed? I found a point for every shot I checked in game 1, but couldn't find any points corresponding to game 4:
>>> g1.loc[g1.shooter_team_name.str.contains('Lightning'),['coordinate_x','coordinate_y','shooter_name']] coordinate_x coordinate_y shooter_name 212 80.0 -8.0 Alex Killorn 218 78.0 13.0 Brayden Point 250 77.0 -10.0 Steven Stamkos ... >>> g4 = df[df.game_id==2018030114]>>> g4.loc[g4.shooter_team_name.str.contains('Lightning'),['coordinate_x','coordinate_y','shooter_name']].sort_values('coordinate_x', ascending=False) coordinate_x coordinate_y shooter_name 624 84.0 -5.0 Anthony Cirelli 587 83.0 -6.0 Anthony Cirelli 622 83.0 -0.0 Brayden Point 606 83.0 -2.0 Cedric Paquette ...
References if useful: Lightning 2018 playoffs
I think the following code would make the team_df['shooter_team_name'] smaller.
Do we need this part? team_df = team_df[team_df['home_team'] == selected_team]
team_df = game_df[game_df['shooter_team_name'] == selected_team]
#print(team_df['shooter_team_name'])
team_df = team_df[team_df['home_team'] == selected_team]
#print(team_df['shooter_team_name'])
Hey Team,
Yes that line was meant for debugging (when I was trying to see where each team started the game from)
team_df = team_df[team_df['home_team'] == selected_team]
so I removed it and the "incomplete visualization" issue should be solved now.
The task has two parts (code and blog): for the code:
for the blog:
The aesthetics are still 80% good.