elishayer / mRchmadness

NCAA men's basketball data scraping and bracketology R package
18 stars 10 forks source link

Using 538 or other predictions #29

Closed gheemony closed 3 years ago

gheemony commented 3 years ago

I apologize if this is addressed in the documentation, but I have not been able to find an answer to this in the Vignette or elsewhere: How do you simulate a tournament and find and test brackets using 538 or other predictions?

The Vignette uses the Bradley-Terry example which generates a prob.matrix that is one row for each team and one column for each team (64x64).

However, the past 538 predictions that are provided are 64x7: a row for each team and a column for win probability in each round. Is there a function to convert this to a prob.matrix that can be used with sim.bracket, draw.bracket, and test.bracket?

elishayer commented 3 years ago

Responses to the main question is below. Feel free to followup with anything or close the issue, otherwise I will close this tomorrow.

Note that the 538 and kenpom distributions are not imported into the package for 2021 at this point. I'd be open to a PR that grabs the 538 data (https://projects.fivethirtyeight.com/2021-march-madness-predictions/) and formats it into the 64x7 and maps the teams in teams.men, but I don't currently plan to do this myself. I remember it being somewhat of a hassle last time I did it to make that transformation. I don't see kenpom probs out at this point. The population picks are in the package for 2021 now, though, with the completion of #28.

gheemony commented 3 years ago

I've taken the current 538 data and put it into the format for previous prediction files. See attached.

The file/object is in my Global Environment. But when I run find.bracket, I get an error: "Error: "pred.538.men.2021" is not an exported object from 'namespace:mRchmadness". I can hunt around the StackExchange and other sources for workarounds but thought you might be able to find an easier way to fix this problem. Possibly altering the code so that the Global Environment is checked for data since it appears that update data won't be provided every year. Thanks! pred.538.men.2021.zip

elishayer commented 3 years ago

Good work! I will re-open and convert this over to a data update issue, and try to tackle it by the end of the day. The package is looking for pred.538.men.2021 within the package's namespace. You could have it look in the global environment by removing the mRchmadness:: here:

https://github.com/elishayer/mRchmadness/blob/bc54ebfd344df160200601c76ec46c160112e09c/R/sim.bracket.source.R#L31

That said, this will only work if the teams are mapped in teams.men$name.538 to associate the team names 538 uses with the ESPN IDs that drive the rest of the analysis, which takes a bit of effort. I will do this by the end of the day if you have not.

gheemony commented 3 years ago

Updated teams.men file with 538 team names appended. teams.men.update.zip

gheemony commented 3 years ago

I'm unable to fix the issue by makingthe one change to sim.bracket.source.R. I can't find the file in the namespace or in the package directory. I download the raw file from GitHub, make the change, and run it so that modified version is in the Global Environment, but it doesn't prevent the error. I'm hoping you can make the fix relatively soon so that brackets can be prepared tonight. Thanks.

gheemony commented 3 years ago

I assume this won't get done tonight?

elishayer commented 3 years ago

It should within 20 min or so

elishayer commented 3 years ago

Mention it here if there are any issues after re-installing the package. I didn't test it much, but did make sure the team names are all mapped so I think it should work.

gheemony commented 3 years ago

[1] "2378" "2306" "12" "2084" "96" "2166" "2132" "2247" "2440" "2750" "2390" "2752" "2640" "2571" "288" "245" "2507" "149" [19] "2515" "276" "93" "2239" "2086" "2617" "219" "2550" "152" "2" "232" "166" "150" "227" "2083" "2628" Error in sim.bracket.source(prob.source = prob.source, league = league, : No predictions from source for above teams. Is year correct?

gheemony commented 3 years ago

Checking now to see why teams are missing.

gheemony commented 3 years ago

The teams.men object is incorrect. It is missing IDs that are in the bracket.men.2021. I matched names from 538 to teams.men, so that is why the update is still incorrect. I will look into matching to bracket.men.2021.

elishayer commented 3 years ago

I used the 538 CSV you sent over but ended up doing the mappings myself in the way I normally do because it was easier to apply for me.

I don't follow that second part though, all the teams in bracket.men.2021 map to teams in teams.men:

> mRchmadness::bracket.men.2021 %in% mRchmadness::teams.men$id
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[17]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[33]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
[49]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE
> mRchmadness::bracket.men.2021[!mRchmadness::bracket.men.2021 %in% mRchmadness::teams.men$id]
[1] "2724/2181" "127/26"    "116/2640"  "2450/2026"
> c(2724, 2181, 127, 26, 116, 2640, 2450, 2026) %in% mRchmadness::teams.men$id
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
gheemony commented 3 years ago

Ok, then why do I get the error that there are no predictions from the source?

elishayer commented 3 years ago

Looking through some of the IDs that are printed out in the error there appear to be teams that are not in the 2021 bracket. UMBC is the first team ID, for example, which suggests somewhere along the way you are using 2018 data (perhaps from a default arg), as the error message suggests. Did you specify year = 2021 in the call to sim.bracket?

elishayer commented 3 years ago

Also, worth mentioning there's a working version (not supporting 538 probs, using the population pick probs for the distribution of pool picks and a Bradley-Terry model for the game probs) for 2021 at https://saberpowers.shinyapps.io/mRchmadness/

gheemony commented 3 years ago

I had an incorrect reference to 2018. But now get an error for the references to the first four games:

[1] "2450/2026" "2724/2181" "116/2640" "127/26" Error in sim.bracket.source(prob.source = prob.source, league = league, : No predictions from source for above teams. Is year correct?

I appreciate you hanging in there with me. I think this is the last thing to sort out.

elishayer commented 3 years ago

I did notice the CSV you sent for the 538 probs only had 64 rows instead of the 68 I expected. Did the first four teams in there represent the prob for both of them? Assuming so, I think I know what to... give me a few min

gheemony commented 3 years ago

I did notice the CSV you sent for the 538 probs only had 64 rows instead of the 68 I expected. Did the first four teams in there represent the prob for both of them? Assuming so, I think I know what to... give me a few min

Yes, I combined their probabilities because I thought it was necessary. Sorry. Now option to take latest 538 probs with just the 4 winners or take the old probs with all 68 teams.

elishayer commented 3 years ago

There's a chance it works with the commit above but no promises. This is what I did to generate the change in data, for reference.

teams.men$name.538 = as.character(teams.men$name.538)
teams.men[teams.men$id == 2026, 'name.538'] = 'Appalachian St'
teams.men[teams.men$id == 2181, 'name.538'] = 'Drake'
teams.men[teams.men$id == 2450, 'name.538'] = 'Norfolk State'
teams.men$name.538 = as.factor(teams.men$name.538)
save(teams.men, file='.../mRchmadness/data/teams.men.RData')

pred.538.men.2021$name = as.character(pred.538.men.2021$name)
pred.538.men.2021[pred.538.men.2021$name == 'Wichita State', 'name'] = 'Wichita State/Drake'
pred.538.men.2021[pred.538.men.2021$name == 'Michigan State', 'name'] = 'Michigan State/UCLA'
pred.538.men.2021[pred.538.men.2021$name == "Mount St. Mary's", 'name'] = "Mt St Mary's/Texas Southern"
pred.538.men.2021[pred.538.men.2021$name == 'Appalachian State', 'name'] = 'Norfolk State/Appalachian St'
pred.538.men.2021$name = as.factor(pred.538.men.2021$name)
save(pred.538.men.2021, file='.../mRchmadness/data/pred.538.men.2021.RData')
gheemony commented 3 years ago

All but one worked: "116/2640" Could it be that you're just missing the removal for Michigan State?

elishayer commented 3 years ago

I pulled the wrong name for Mt. Saint Mary's in the snippet above. I have a good feeling about it this time.

pred.538.men.2021$name = as.character(pred.538.men.2021$name)
pred.538.men.2021[pred.538.men.2021$name == "Mt St Mary's/Texas Southern", 'name'] = "Mount St. Mary's/Texas Southern"
pred.538.men.2021$name = as.factor(pred.538.men.2021$name)
save(pred.538.men.2021, file='.../mRchmadness/data/pred.538.men.2021.RData')
gheemony commented 3 years ago

Thank you so much. To show my appreciation, I'd love to help with the project going forward, which we can discuss later. I understand if your answer is "thanks, but no thanks." Good luck with your picks.