Closed RyanTorok closed 4 years ago
Already have a fix in place for the next commit.
I took a look at the code for calculating the chance of scorigami, and it works like this:
1) The file scorigami_chances.js has a long list of theoretical combinations of scores that a single team could make, along with probabilities that a team will make that combination of scores (e.g. two touchdowns is more likely for a team to score than, say, two safeties).
2) To calculate the chance of scorigami, the code adds up the probabilities of each of these theoretical point combinations added on to the team's current point total (trying all combinations of additional points for both teams). These probabilities are adjusted based on the time remaining (e.g. it's unlikely a team will score 28 points in addition to their current score if there's only 3 minutes left in the 4th quarter).
3) The final probability is the sum of the products of the probabilities each team makes a particular score. We already have a special case for tie games, where a theoretical score that ends up tied when the game is not in OT will be scaled down by 1/75, because that is roughly the odds of a completely scoreless overtime (the only way such a score could go final if it were reached in regulation)
4) As I see it, there are two issues with the current probability algorithm:
a) In getProb(), the remaining time is calculated to be [ (4-quarter) * 15 + current_clock_in_minutes ] minutes. If the API used to retrieve the game data uses 5 to represent OT, we'll get a negative time remaining. This is likely the cause of the negative probability bug, as ALL the possible scores tested will end up with negative probability at this point.
b) It doesn't look like we're distinguishing between the possible final scores we could end up with in overtime versus regulation. If a game goes to overtime tied at N points each, The only possible final scores are:
N to N (Scoreless OT) N+2 to N (Team A scores safety) N+3 to N (Team A scores unanswered FG on first possession or any FG in sudden death) N+6 to N (Team A scores a TD) N+3 to N+2 (Team B scores FG, Team A scores safety -- this one is only possible with a turnover on the final play) N+3 to N+3 (Both teams score FG on their first possession, no other scoring) N+3 to N+6. (Team B scores FG on first possession, Team A scores TD on following drive)
Obviously, a score like N+14 to N would be given a higher probability than, say, N+2 to N, but in the special case of an OT game, the first one is not possible at all. We could fix this by adding a special case for OT and enumerate which indices in the 'chances' table are actually possible in OT. Had the negative probability bug mentioned in part a) not been present, we would have flagged scores like 36-23 as possible scorigami for the Eagles-Bengals game I mentioned above. However, in OT, such a score is impossible, and we would have ended up with a nontrivial positive scorigami chance, where the real chance of scorigami is effectively zero. The only chance was 26-25, but, as mentioned above, the only way that could happen is for the winning team to somehow surrender a safety after a turnover on the final play of the game, something that's probably less likely than a team ending a game with 1 point)
Bottom line: if we just want to fix the negative probability bug, we could get away with just replacing the (4-quarter) in getProb() with min(4-quarter, 0), but if we want more accurate scorigami chance calculations for OT games, we'll need to avoid checking for scoring combinations that would result in the game ending before they would be reached.
Problem A is already fixed (not yet pushed) It was caused by me being forced to switch APIs this year to one that handles things differently and I'm still working out the kinks. In this case, the old API hade OT as a game state (along with pregame, halftime, over, etc) and this one has it as period 5.
Problem B is something I have been aware of but have determined is not worth fixing. In part because the probability is already a mediocre estimate at best, but mostly because I don't have the data.
As shown in the attached image, the game between the Eagles and Bengals is late in overtime, tied at 23. Because the game is in overtime, the only possible final scores for this game are 23-23 (tie score), 25-23 (either team scores a safety) 26-23 (either team scores a field goal), or 29-23 (either team scores a touchdown). The scorigami chart shows all of these possible final scores have happened before, so the chance of scorigami should be 0.0%. However, the "chance of scorigami" algorithm shows -16.56%, and the value went even more negative as the game inched closer to the end of overtime (with 0:49 left the scorigami probability showed about -29%). It looks like the "chance of scorigami" calculation incorrectly accounts for the possible scores remaining when a game goes to overtime.