BillPetti / baseballr

A package written for R focused on baseball analysis. Currently in development.
billpetti.github.io/baseballr
Other
369 stars 99 forks source link

Wrong matching Umpire Data #351

Closed mIgLLL closed 1 month ago

mIgLLL commented 1 month ago

I find that the game 632224 have two home plate umpire (Ron Kulpa and Ryan Blakney), which is unreasonble.

Actually, it true data is as followed:

Umpires:HP: Ron Kulpa. 1B: Brian O'Nora. 2B: Ryan Wills. 3B: Ryan Additon.

We can find that the true umpire is Ron Kulpa, not Ryan Blakney.

But why Ryan Blankey is here? I found that there are two games between BAL and SEA in that day. Ryan Blankey is the another game's umpire.

So this is probably a matching issue. They matched the umpire information and game_pk only by the game_date and home_team and away_team. I strongly doubt this robustness. Even I match the two dataset with game_date, home_team,away_team, home_score, and away_score, there is still a few wrong matches. Please be careful.

I don't check the whole sample. But there are 200 games are with duplicated umpires.