cshaley / bracketeer

Generate predicted bracket from a kaggle march madness submission
MIT License
33 stars 13 forks source link

New submission format :( #6

Open KyleKaminky opened 3 months ago

KyleKaminky commented 3 months ago

The kaggle competition has changed their submission format, I'll try to start working on a PR for this

cshaley commented 3 months ago

Wow, just got around to reviewing the change. Complete change of submission format.

The file should contain a header and have the following format:

RowId,Tournament,Bracket,Slot,Team 1,M,1,R1W1,W01 2,M,1,R1W8,W08 3,M,1,R1W5,W05 ... Here, the RowId column is a dummy index required by the metric; it should be a simple enumeration of the rows. The Tournament column indicates either the Men's (M) tournament or the Women's (W) tournament. The Bracket column enumerates the brackets in each tournament, starting from 1; you should use a unique enumeration for each tournament. The Team column should contain the team you predict to win in that respective Slot.

So updated design of this should:

  1. Allow user to specify for which bracket prediction (e.g. Men's, #1) to generate a bracket.
  2. Allow user to generate multiple brackets - or all in a submission.
  3. It'd be nice to implement the Brier probability scoring function to generate a bracket based on mean prediction of a submission.

New unit testing may require some new fake data (Kaggle owns this dataset).

Need to update dependencies and build process. Way out of date.

cshaley commented 3 months ago

Pseudo code

Two externally accessible functions: Build bracket Build brackets

Build brackets just calls build bracket multiple times depending on args, passing only relevant data to build bracket. Load submission file Filter to only requested rows, iterate if multiple submissions. Output dir param

Build bracket Output file param Bracket choices come from submission df (param) Map from slot data to team name for image (test with slot data in image first?) Remove probability comparison Determine winner based on slot data instead