I find that the game 632224 have two home plate umpire (Ron Kulpa and Ryan Blakney), which is unreasonble.
Actually, it true data is as followed:
Umpires:HP: Ron Kulpa. 1B: Brian O'Nora. 2B: Ryan Wills. 3B: Ryan Additon.
We can find that the true umpire is Ron Kulpa, not Ryan Blakney.
But why Ryan Blankey is here? I found that there are two games between BAL and SEA in that day. Ryan Blankey is the another game's umpire.
So this is probably a matching issue. They matched the umpire information and game_pk only by the game_date and home_team and away_team. I strongly doubt this robustness. Even I match the two dataset with game_date, home_team,away_team, home_score, and away_score, there is still a few wrong matches. Please be careful.
I don't check the whole sample. But there are 200 games are with duplicated umpires.
I find that the game 632224 have two home plate umpire (Ron Kulpa and Ryan Blakney), which is unreasonble.
Actually, it true data is as followed:
Umpires:HP: Ron Kulpa. 1B: Brian O'Nora. 2B: Ryan Wills. 3B: Ryan Additon.
We can find that the true umpire is Ron Kulpa, not Ryan Blakney.
But why Ryan Blankey is here? I found that there are two games between BAL and SEA in that day. Ryan Blankey is the another game's umpire.
So this is probably a matching issue. They matched the umpire information and game_pk only by the game_date and home_team and away_team. I strongly doubt this robustness. Even I match the two dataset with game_date, home_team,away_team, home_score, and away_score, there is still a few wrong matches. Please be careful.
I don't check the whole sample. But there are 200 games are with duplicated umpires.