Open segiddins opened 3 years ago
Hmm, I put in some protection against this problem here, but looks like it's not working: https://github.com/droher/boxball/blob/72c7bc05993968b0897c1bcf9f662ed1e82b2776/extract/parsers/retrosheet.py#L61
I'll try to patch. Adding a general source column across all of these tables would be a great idea. For now, I do have an extra retrosheet_deduced_game
table that you can join on to find which games have deduced entries -- I know that doesn't help with disambiguation, though.
This hasn't been resolved in the code, but I've manually removed the duplicated games from my Retresheet fork, so the newly published version should be free of this bug.
cwdaily
outputs daily lines for each player, which include the source for the game information. For games with multiple sources, there will be multiple daily entries for a given (player_id, game_id) tuple, and right now there's no column that can be used to disambiguate.E.g.
select * from retrosheet_daily where game_dt = '1943-06-19' and player_id = 'mackr101';
yields 5 rows for 2 games (two halves of a double header), which each game having a box score & deduced game, according to https://raw.githubusercontent.com/chadwickbureau/retrosplits/master/daybyday/playing-1943.csv. I'm not sure why mack in particular has 2 deduced game entries for CHA194306191, but that's probably an issue in chadwick