Closed saberpowers closed 6 years ago
I would love to see this too.
Thanks! Will prioritize.
Here's the plan. From round advancement probabilities, get probability of winning in each round, conditional on reaching that round. Define the home team bias as increasing that conditional log-odds by +.75 (chosen to match the sparse-detail results published by Brad Null), then recombine conditional probabilities in each round to get probability of reaching each round. This procedure was chosen in an attempt to very roughly mimic the Null's results: Applying this bias to the population picks from the 2017 men's tournament, we have roughly matched the results Null presented: Note that the long-shots have a bit less of a bias ratio with our math, but this is partly explained by the fact that 0.1% was the lowest frequency reported by ESPN of a team being picked to win the national championship, and Null defined long-shots as having less than 0.1% pick frequency. Hence these long-shots are not as long in our math as Null's long-shots.
Formulating the homer bias this way has an appealing interpretation. In the biased fan's mental model, their team is better by +.75 in terms of the estimated team strength coeffcient in the Bradley-Terry model.
Here's the hastily written code used to produce the second figure above:
`%>%` = dplyr::`%>%`
cumulative = mRchmadness::pred.pop.men.2017 %>% tibble::as.tibble()
cumulative$type = ifelse(cumulative$round6 > .05, 'favorite',
ifelse(cumulative$round6 <= .001, 'longshot', 'middle'))
conditional = cumulative
conditional$round2 = cumulative$round2 / cumulative$round1
conditional$round3 = cumulative$round3 / cumulative$round2
conditional$round4 = cumulative$round4 / cumulative$round3
conditional$round5 = cumulative$round5 / cumulative$round4
conditional$round6 = cumulative$round6 / cumulative$round5
conditional$round6[conditional$round6 == 1] = .999
bias = function(p, k = .7) {
pmin(.999, exp(log(p / (1 - p)) + k) /
(1 + exp(log(p / (1 - p)) + k)))
}
conditional_bias = conditional %>% dplyr::mutate(
round1 = bias(round1),
round2 = bias(round2),
round3 = bias(round3),
round4 = bias(round4),
round5 = bias(round5),
round6 = bias(round6))
cumulative_bias = conditional_bias
cumulative_bias$round2 = cumulative_bias$round1 * conditional_bias$round2
cumulative_bias$round3 = cumulative_bias$round2 * conditional_bias$round3
cumulative_bias$round4 = cumulative_bias$round3 * conditional_bias$round4
cumulative_bias$round5 = cumulative_bias$round4 * conditional_bias$round5
cumulative_bias$round6 = cumulative_bias$round5 * conditional_bias$round6
prob = cumulative %>%
dplyr::select(-name) %>%
dplyr::group_by(type) %>%
dplyr::summarize_all(mean) %>%
dplyr::select(-type) %>%
as.matrix
prob_bias = cumulative_bias %>%
dplyr::select(-name) %>%
dplyr::group_by(type) %>%
dplyr::summarize_all(mean) %>%
dplyr::select(-type) %>%
as.matrix
matplot(t(prob_bias / prob), type = 'l', ylab = 'Avg. Bias Ratio',
ylim = c(0, 15), axes = FALSE, lwd = 2, lty = 1,
col = c('forestgreen', 'dodgerblue', 'darkorange'))
axis(2, at = c(0, 5, 10, 15), labels = c('0%', '500%', '1000%', '1500%'))
axis(1, at = 1:6, labels = c('R32', 'S16', 'E8', 'F4', 'Final', 'Champ'))
legend('topleft', c('longshots', 'middle', 'favorites'),
col = c('dodgerblue', 'darkorange', 'forestgreen'), lwd = 2, bty = 'n')
So how would you incorporate this into the mRchmadness workflow? I don't see how to supply this information into pool.source
(which takes a string) in the find.bracket()
function. I want to use the biased probabilities to simulate my pool right?
I'm almost done with a add.home.bias
function that will take a character vector of home teams and return a modified pred.pop.[league].[year]
with the probabilities increased for the home teams (which will frequently be a length-1 vector) according to the formulation above. Once that's done, I'll add a home.teams
argument to find.bracket
and test.bracket
, and each will call add.home.bias
when the home.teams
argument is not NULL. You can expect this functionality to be in the 1.0.3 release at the end of today.
You're the real MVP this year!
:heart:
This is more of a hand-wavy/rule of thumb adjustment based on the research presented here: https://www.cbssports.com/college-basketball/news/homer-bias-is-real-and-it-will-derail-your-march-madness-bracket/
Users should be allowed to select "hometown" teams, leading to changes to the population pick distribution.