beanumber / openWAR

An R package enabling the computation of openWAR using MLBAM data
99 stars 33 forks source link

Enhancement suggestion - add an mlbam ID column to all by-player results #76

Closed znmeb closed 9 years ago

znmeb commented 9 years ago

I'm now at the point where I want to join openWAR results with other data. I found the 'idTT' table in openWARData, but rather than mess around with 'fuzzy joins' I'd like to just add player name columns to that table for various other datasets and index everything by mlbam ID.

I can probably figure out from the code how to do this if you can't get to it in this round of changes. Right now the tables I want most are the "shakeWAR" results and the summary that has WAR and the RAA values.

beanumber commented 9 years ago

Great idea. We will add this!

beanumber commented 9 years ago

Just added this to

summary.openWARPlayers()
summary.do.openWARPlayers()

in 7e53e60.

Does that do the trick?

znmeb commented 9 years ago

Looks good - I can't test it at the moment. I ran a 5000-sample shakeWAR over 2015 to date last night and it's still saving the output file as .rdata ;-)

znmeb commented 9 years ago

OK - it's working! Thanks!

One more enhancement - let shakeWAR work from an output of makeWAR if you just want to resample the plays. On my workstation for 2015 to date makeWAR runs about 1.3 minutes and each resample runs about 0.03 minutes.

When you wrote the paper, did you just resample on plays or did you resample on the models too? If I have a spare overnight I might try eight hours worth of the 'both' option. ;-)

gjm112 commented 9 years ago

We re-sampled both plays and models.

On Sun, Jun 14, 2015 at 1:18 AM, M. Edward (Ed) Borasky < notifications@github.com> wrote:

OK - it's working! Thanks!

One more enhancement - let shakeWAR work from an output of makeWAR if you just want to resample the plays. On my workstation for 2015 to date makeWAR runs about 1.3 minutes and each resample runs about 0.03 minutes.

When you wrote the paper, did you just resample on plays or did you resample on the models too? If I have a spare overnight I might try eight hours worth of the 'both' option. ;-)

— Reply to this email directly or view it on GitHub https://github.com/beanumber/openWAR/issues/76#issuecomment-111787228.

Gregory J. Matthews, Ph.D. Assistant Professor Department of Mathematics and Statistics Loyola University Chicago E-mail: gjm112 -at- gmail.com Blog: StatsInTheWild.com Art: etsy.me/1JAsYz9 Twitter: @StatsInTheWild http://www.twitter.com/StatsInTheWild Twitter: @StatsClass http://www.twitter.com/statsclass

znmeb commented 9 years ago

@gjm112 Well then, I'll try that tonight. I just paid the power bill. ;-)

gjm112 commented 9 years ago

it is really slow. And the variance in the models is dwarfed by the variance in resampling the plays.

On Mon, Jun 15, 2015 at 10:10 PM, M. Edward (Ed) Borasky < notifications@github.com> wrote:

@gjm112 https://github.com/gjm112 Well then, I'll try that tonight. I just paid the power bill. ;-)

— Reply to this email directly or view it on GitHub https://github.com/beanumber/openWAR/issues/76#issuecomment-112269974.

Gregory J. Matthews, Ph.D. Assistant Professor Department of Mathematics and Statistics Loyola University Chicago E-mail: gjm112 -at- gmail.com Blog: StatsInTheWild.com Art: etsy.me/1JAsYz9 Twitter: @StatsInTheWild http://www.twitter.com/StatsInTheWild Twitter: @StatsClass http://www.twitter.com/statsclass

znmeb commented 9 years ago

I ran 4096 resamples of plays only for 2015 to date the other night. It ran about three hours (4 GHz CPU, 32 GB of RAM, probably single-threaded). This is an 8-core box and Monte Carlo is supposed to be embarrassingly parallel.

beanumber commented 9 years ago

@znmeb , I've added the following methods in c096eda6:

shakeWAR.list()
shakeWAR.openWARPlays()

I think this achieves the functionality that you're looking for. Good suggestion! We'll have to defer the speed issue until the next release. Note this is a duplicate of #42 .