Alex-At-Home / cbb-explorer

WIP parser and model for aggregating and playing with CBB stats
Apache License 2.0
0 stars 0 forks source link

[Lineups] Possession count looks wrong (and in fact so does scoring) #16

Closed Alex-At-Home closed 5 years ago

Alex-At-Home commented 5 years ago

The total numbers for 2018/9 (missing 9 lineup events/712) came out as:

But KenPom raw ratings has us at 107.1 and 98.8

Sports ref has us as

(Which leads to O/100 = 108 and D/100 = 99 so basically in agreement with KP)

So I somehow have too many possessions but not enough points :/

Alex-At-Home commented 5 years ago

Well it's easy to find a possession discrepancy, eg take the first game Delaware (https://kenpom.com/box.php?g=58): 73-67 in 72 possessions

My aggregating lineup stats instead have 73-67 in 87 possessions, so there must be some situation I'm consistently getting wrong

Find that and then go through some more games I guess :(

Alex-At-Home commented 5 years ago

Notes


{
  "opponent": "18:04:00,2-5,Ryan Johnson, steal",
  "opponent_possession": 2
},
{
  "team": "18:04:00,2-5,Bruno Fernando, turnover badpass",
  "team_possession": 3
},

Steal gets tagged onto the wrong possession here, which is currently harmless


Similarly:

{
  "opponent": "14:23:00,12-14,Ryan Johnson, foul personal shooting;2freethrow",
  "opponent_possession": 9
},
{
  "team": "14:23:00,12-14,Bruno Fernando, foulon",
  "team_possession": 10
},

Ah OK I see one problem ... if a lineup change occurs before the possession is complete it will get double counted

Example... end of 3rd lineup:

{
  "team": "11:22:00,16-17,Bruno Fernando, foulon",
  "team_possession": 1
},
{
  "opponent": "11:22:00,16-17,Matt Veretto, foul personal",
  "team_possession": 1
}

start of next lineup:

{
  "team": "11:19:00,16-17,Bruno Fernando, rebound offensive",
  "team_possession": 1
},
{
  "team": "11:18:00,16-19,Bruno Fernando, 2pt dunk 2ndchance;pointsinthepaint made",
  "team_possession": 1
},

OK so let's fix that and see where we end up...

Alex-At-Home commented 5 years ago

Here's another one:

      RawGameEvent(Some("08:48:00,20-23,Serrel Smith Jr., rebound defensive"), None, Some(5), None),
      RawGameEvent(None, Some("08:44:00,20-23,Jacob Cushing, steal"), None, Some(5)),
      RawGameEvent(Some("08:44:00,20-23,Serrel Smith Jr., turnover lostball"), None, Some(6), None),

A steal like a block needs to be ignored, we'll flip on the actual offensive action...

Alex-At-Home commented 5 years ago

Timeout!

      RawGameEvent(Some("04:04:00,26-33,Aaron Wiggins, assist"), None, Some(2), None),
      RawGameEvent(None, Some("04:04:00,26-33,Team, timeout short"), None, Some(3)),
      RawGameEvent(Some("04:04:00,26-33,Anthony Cowan, 2pt layup fromturnover;pointsinthepaint;fastbreak made"), None, Some(3), None),
Alex-At-Home commented 5 years ago

duh offensive foul

      RawGameEvent(None, Some("02:33:00,27-38,Eric Carter, foulon"), None, Some(1)),
      RawGameEvent(None, Some("02:28:00,27-38,Ryan Johnson, foul offensive"), None, Some(1)),
      RawGameEvent(Some("02:28:00,27-38,Jalen Smith, foulon"), None, Some(1), None),
      RawGameEvent(None, Some("02:28:00,27-38,Ryan Johnson, turnover offensive"), None, Some(2)),
      RawGameEvent(Some("02:15:00,27-39,Anthony Cowan, freethrow 1of2 fromturnover made"), None, Some(2), None),

Actually those foulon are the real ones to ignore I think?

Alex-At-Home commented 5 years ago

I have no idea what this means:

//prev event finishes with:
//RawGameEvent(Some("07:49:00,51-61,Darryl Morsell, freethrow 2of2 fastbreak;fromturnover made"), None, Some(5), None)

     RawGameEvent(None, Some("07:49:00,51-60,Jacob Cushing, foul personal shooting;2freethrow"), None, None),
      RawGameEvent(Some("07:45:00,51-61,Team, rebound offensivedeadball"), None, Some(1), None),
      RawGameEvent(None, Some("07:43:00,51-61,Kevin Anderson, 3pt jumpshot missed"), None, Some(1)),

so that offensivedeadball pulls the entire event into the next lineup event (incorrectly but that's a separate problem covered by another issue)

So actually what that means is that the problem must have occurred in the other event

(but this will be a problem in general when it does occur?)

Oh I understand what happened at least ... Morsell takes his second free throw and then gets subbed out. So this should be fine in practice

AlexP-Elastic commented 5 years ago

Analysis of 20 teams following first wave of fixes:

(me vs KP)
Delaware - correct
Navy - 70 vs 71 (i think that's just team vs opp possession)
NCAT - 71 vs 70
Hofstra - missing!
MSM - 73 vs 74
Virginia - missing!
* PSU - 67 vs 64
@Purdue (missing records)
Loyola - 63 vs 61
Loyola MD - missing!
Seton Hall - missing!
* Radford - 68 vs 65
Neb - 66 vs 65
@ Rut - 74 vs 72
** Minn - 70 vs 66
Indiana - 66 vs 65
*** Wis - 66 vs 61
@ Ohio St - correct
Mich St - 63 vs 62
v Illinois - correct
NW - correct
@Wisc - 62 vs 61
** @Neb - 69 vs 65
Purdue - 67 vs 65
@ Mich - correct
@Iowa - missing!
Oh St - 66 vs 64
@PSU (missing records)?
Mich - correct
Minn - 66 vs 65
v Neb - 65 vs 64
Belmont - missing!
LSU - 71 vs 72
Alex-At-Home commented 5 years ago

Looking at the Wisconsin-UMD game (discrepancy of 5):

  RawGameEvent(None, Some("12:32:00,10-17,Team, rebound offensive team"), None, Some(3)),
      RawGameEvent(None, Some("12:32:00,10-17,Brevin Pritzl, substitution in"), None, Some(3)),
      RawGameEvent(None, Some("12:32:00,10-17,D'Mitrik Trice, substitution out"), None, Some(3)),
      RawGameEvent(Some("12:26:00,10-17,Team, rebound defensive team"), None, Some(3), None),
      RawGameEvent(None, Some("12:26:00,10-17,Brad Davison, 2pt jumpshot 2ndchance missed"), None, Some(4)),
      RawGameEvent(None, Some("12:25:00,10-17,Aleem Ford, foul personal"), None, Some(4)),

so MD's rebound gets registered before the shot, so gets logged as a change in possession

Options:

I think the second one here might be nicest?

Alex-At-Home commented 5 years ago

Added that logic, but it's still totally broken, here's the diff:

Map(
  (TeamSeasonId(TeamId("Hofstra"), Year(2018)), Home) -> (5, 5),
  (TeamSeasonId(TeamId("Wisconsin"), Year(2018)), Home) -> (4, 4),
  (TeamSeasonId(TeamId("Iowa"), Year(2018)), Away) -> (2, 2),
  (TeamSeasonId(TeamId("Navy"), Year(2018)), Away) -> (0, 0),
  (TeamSeasonId(TeamId("Michigan"), Year(2018)), Away) -> (1, 1),
  (TeamSeasonId(TeamId("Penn St."), Year(2018)), Home) -> (3, 3),
  (TeamSeasonId(TeamId("Illinois"), Year(2018)), Neutral) -> (2, 2),
  (TeamSeasonId(TeamId("Minnesota"), Year(2018)), Home) -> (2, 3),
  (TeamSeasonId(TeamId("Northwestern"), Year(2018)), Home) -> (1, 1),
  (TeamSeasonId(TeamId("Ohio St."), Year(2018)), Home) -> (1, 2),
  (TeamSeasonId(TeamId("Loyola Chicago"), Year(2018)), Neutral) -> (1, 1),
  (TeamSeasonId(TeamId("Delaware"), Year(2018)), Home) -> (3, 3),
  (TeamSeasonId(TeamId("Belmont"), Year(2018)), Neutral) -> (2, 2),
  (TeamSeasonId(TeamId("Virginia"), Year(2018)), Home) -> (1, 1),
  (TeamSeasonId(TeamId("Nebraska"), Year(2018)), Home) -> (2, 2),
  (TeamSeasonId(TeamId("Loyola Maryland"), Year(2018)), Home) -> (2, 2),
  (TeamSeasonId(TeamId("Purdue"), Year(2018)), Away) -> (0, 0),
  (TeamSeasonId(TeamId("Minnesota"), Year(2018)), Away) -> (4, 4),
  (TeamSeasonId(TeamId("N.C. A&T"), Year(2018)), Home) -> (2, 2),
  (TeamSeasonId(TeamId("Wisconsin"), Year(2018)), Away) -> (1, 1),
  (TeamSeasonId(TeamId("Michigan St."), Year(2018)), Away) -> (0, 1),
  (TeamSeasonId(TeamId("Radford"), Year(2018)), Home) -> (0, 0),
  (TeamSeasonId(TeamId("Seton Hall"), Year(2018)), Home) -> (5, 5),
  (TeamSeasonId(TeamId("LSU"), Year(2018)), Neutral) -> (1, 1),
  (TeamSeasonId(TeamId("Nebraska"), Year(2018)), Neutral) -> (2, 2),
  (TeamSeasonId(TeamId("Ohio St."), Year(2018)), Away) -> (1, 1),
  (TeamSeasonId(TeamId("Mount St. Mary's"), Year(2018)), Home) -> (4, 4),
  (TeamSeasonId(TeamId("Marshall"), Year(2018)), Home) -> (7, 6),
  (TeamSeasonId(TeamId("Indiana"), Year(2018)), Home) -> (1, 1),
  (TeamSeasonId(TeamId("Michigan"), Year(2018)), Home) -> (0, 0),
  (TeamSeasonId(TeamId("Nebraska"), Year(2018)), Away) -> (2, 2),
  (TeamSeasonId(TeamId("Penn St."), Year(2018)), Away) -> (0, 1),
  (TeamSeasonId(TeamId("Rutgers"), Year(2018)), Away) -> (3, 3),
  (TeamSeasonId(TeamId("Purdue"), Year(2018)), Home) -> (2, 2)
)

(TeamSeasonId(TeamId("Wisconsin"), Year(2018)), Home) -> (4, 4) so that did address it (still off by 1) there, but eg Delaware was right and now is wrong :(

I think the new plan has got to be to change completely how I calc possessions, by looking for end of possessions

Alex-At-Home commented 5 years ago

The approach I think I like is:

I think the rules are something like: