Cammac7 / bracket-filler

Using ML to win March Madness Bracket
0 stars 1 forks source link

Incorporating tournament structure #2

Open Cammac7 opened 7 years ago

Cammac7 commented 7 years ago

So, the first machine learning file I did (with the logistic regression) I think focused on learning from tournament structure (i.e. likelihood that a 3 seed beats a 1 seed, etc.) and I think the downside to the elo is that it approaches from a purely team approach. i.e. "Team A beat Team B the majority of times before so Team A will beat Team B" but this doesn't really hold up, because NCAA teams aren't like individual chess players. they're parabolic. 20 years ago a team could be the worst in the league, and then 10 years later be the best, and then be awful again. How do we incorporate this? I THINK our current Elo system ranks the list based on all of history (i.e. greatest team of all time?) so how do we make it just rank "greatest of this year"?

jasontrigg0 commented 7 years ago

This is a good point, and I think it's what the elo K factor is for. Elo will over time start to forget your old results and adjust to your more recent ones. With a high k factor it'll mostly reflect your last games, while a low factor will give more of a 'greatest team of all time' result. I think tweaking the k value to find the best one is a useful exercise. Another option to address this problem is to reset the elos at the start of each year?

Cammac7 commented 7 years ago

Yea I messed with elo a bit and you actually got pretty damn close to a maximum (or at least a local max) with 50. I bumped it up a bit (and increased our elo score by like 2). I'm gonna write a script to test against like 300 different k factors to see which is the best though.

I think the crux of the issue is that elo ranks different teams, whereas a LogReg or other ML algo looks for patterns more than focuses on ranks. Like, if every year for all of history St. Mary's has lost to SMU, our Elo isn't going to pick that up, but ML will. I'm gonna try to implement the LogReg and we should try to combine them I think.