adeshpande3 / March-Madness-ML

Machine learned bracketology
190 stars 68 forks source link

2022 Revamp & 2020 issues #18

Open lightningcookies opened 2 years ago

lightningcookies commented 2 years ago

Hey, I'm new to ML, trying to get my feet wet here. Can we get this updated to at least skip 2020(no tournament)? The program won't run without 2020 data. I have formatted the College Basketball Reference data for 2021 and 2022 and would love to share if we can get this to work again :)

This project is awesome! I'd love to help make it better

adeshpande3 commented 2 years ago

Thanks for calling this out! Totally forgot that 2020 would mess things up (realizing the code could be written a bit better 😅 ). Remembering how this program works, I think we'd need to do the following

  1. Upload the CSV data from the new Kaggle year https://www.kaggle.com/c/mens-march-mania-2022/data
  2. Upload the formatted College Basketball Reference data like you mentioned. 🙏 ty for doing that
  3. Make sure that 2020 doesn't impact our ability to run the program. What's the error that you're getting when you run? Thinking of 2 options
    • Just remove 2020 from the processing altogether. Some sort of if statement and removing the data from Kaggle CSVs. Downside here is that we'd lose the regular season game info which still would be valuable.
    • Adding some sort of dummy data to replace the values for tournamentSeed, checkConferenceChamp, checkConferenceTourneyChamp. Maybe just make it so that everyone missed the tourney? Since it would apply to everyone that year, I think that should be fine?

I'm a bit pressed on time today, but if you submit a PR for any/all of the above (and I definitely might be missing something so lmk), I'd be more than happy to review and get over the line.

Happy Selection Sunday!