[x] KNN Impute NA Values or use Team Level Stats in Place
[x] Add McDonalds All Americans
[x] Clean kenpom_ratings0321.csv data
Merge into data such that all one row represents a team at a given season
Join data from step 2 into NCAA Tournament Data -- hold 2018-19 as Test Data
Calculate Prior Upset Probability based on seed and round
Make Models
Let's try both round-by-round, and singular models. Logistic/Probits seem to perform well with decent feature selection, but we could also attempt to use keras/torch for this as well.
Data Wrangling
team_level_player_stats
further cleanedkenpom_ratings0321.csv
dataMerge into data such that all one row represents a team at a given season
Join data from step 2 into NCAA Tournament Data -- hold 2018-19 as Test Data
Calculate Prior Upset Probability based on seed and round
Make Models
Let's try both round-by-round, and singular models. Logistic/Probits seem to perform well with decent feature selection, but we could also attempt to use
keras
/torch
for this as well.Validate on 2018-19