atlhawksfanatic / L2M

Last two minute report data from the NBA
MIT License
19 stars 3 forks source link

Secondary variable for 'disadvantaged team' #2

Open jrstromme opened 2 years ago

jrstromme commented 2 years ago

It would be nice add a variable to reclassify the 'disadvantaged team' variable, b/c the use of this in the L2M is so counterintuitive. For example, on incorrect calls (IC), the 'disadvantaged team' is actually based on the player who benefits from getting the call when really there was no call.

e.g. the most recent Wolves/76ers game from 11/27/21. At Q6 01:38.9, there is an IC where the ball is off Embiid, but the 76ers receive possession. The 76ers are listed as the 'disadvantaged team'.

This is something that a lot of people mess up on, even 538 made that mistake (see the correction footnote here: https://fivethirtyeight.com/features/which-nba-team-is-wronged-by-the-refs-the-most/)

Anyways, I am new to this data and may be missing something, but the labeling by the nba seems really counterintuitive.

jrstromme commented 2 years ago

e.g. here is how I tried to reclassify. I may be missing more special cases, however.


df %>% 
  # First of all, we can ignore correct calls. Nobody is 'disadvantaged' if a call is correct
  mutate(disadvantaged_team2 = if_else(decision %in% c('CC','CNC'), NA_character_, disadvantaged_team),
         disadvantaged_side2 = if_else(decision %in% c('CC','CNC'), NA_character_, disadvantaged_side)) %>% 
  #Second of all, if there is an incorrect call (IC), the 'disadvantaged' player listed
  #    is actually who *benefitted* by it being a call. Disadvantaged means they were fouled.
  #Therefore, fore all ICs, we need to change the 'disadvantaged' side to the other team
  mutate(nondis_team = if_else(disadvantaged_team2==home_team, away_team, home_team)) %>% 
  mutate(disadvantaged_team2 = if_else(decision == 'IC', nondis_team, disadvantaged_team2)) %>% 
  #now check for any weird cases, we should expect to only get INC_0 and IC_1
  mutate(disadvantaged_type = if_else(disadvantaged_team2 == committing_team,1,0)) %>% 
  mutate(decision_type = if_else(!is.na(disadvantaged_type),paste0(decision, '_', disadvantaged_type),NA_character_)) %>% 
  # there are a few unexpected results, reclassify them
  mutate(nondis_team = if_else(disadvantaged_team2==home_team, away_team, home_team)) %>% 
  mutate(disadvantaged_team2 = if_else(decision_type == 'INC_1', nondis_team, disadvantaged_team2)) %>% 
  mutate(disadvantaged_team2 = if_else(decision_type == 'IC_0', nondis_team, disadvantaged_team2)) %>% 
  #also fix the 'side' disadvantaged
  mutate(nondis_side = if_else(disadvantaged_side2=='home','away','home')) %>% 
  mutate(disadvantaged_side2 = if_else(disadvantaged_team2 != disadvantaged_team,nondis_side, disadvantaged_side2)) %>% 
  select(-nondis_side, -nondis_team, -decision_type)
atlhawksfanatic commented 10 months ago

I agree that the terms "committing" and "disadvantaged" can be ambiguous, but these are the terms the NBA chose for their variables. I don't plan on doing post-processing aside from game level additions (the player box score statistics, referees, and TV designation).

I'm open to changing the data description in the https://github.com/atlhawksfanatic/L2M/blob/master/README.md but creating a new variable is something that should be done outside of this project.