andrewargeros / minnemudac-2021

MinneMUDAC 2021: March Madness
MIT License
1 stars 0 forks source link

To Do: 2021-03-08 #12

Closed andrewargeros closed 3 years ago

andrewargeros commented 3 years ago

Data Wrangling

  1. team_level_player_stats further cleaned

    • [x] KNN Impute NA Values or use Team Level Stats in Place
    • [x] Add McDonalds All Americans
    • [x] Clean kenpom_ratings0321.csv data
  2. Merge into data such that all one row represents a team at a given season

  3. Join data from step 2 into NCAA Tournament Data -- hold 2018-19 as Test Data

  4. Calculate Prior Upset Probability based on seed and round

Make Models

Let's try both round-by-round, and singular models. Logistic/Probits seem to perform well with decent feature selection, but we could also attempt to use keras/torch for this as well.

Validate on 2018-19

maxbolger commented 3 years ago

KenPom:SportsRef naming conventions can be found here (h/t Philip Martinkus).

andrewargeros commented 3 years ago

Names are joined using a key dataframe.

andrewargeros commented 3 years ago

Closing Issue as Data Wrangling Complete!