TaddyLab / hockey

Chicago Hockey Analytics
7 stars 2 forks source link

basic analysis is done, what next? #2

Closed mataddy closed 9 years ago

mataddy commented 9 years ago

hi @rbgramacy and @sentian,

I've updated this repo (note that it has moved from mataddy to taddylab, which makes it easier for me to manage collaboration).

The new reorg has the old material in code/blog. In the new code, design.R will build you either the regression info for goals or for shots, while performance.R fits the regression and compares to salary; the relevant output tables are the full table of betas, PPM, and PM in https://github.com/TaddyLab/hockey/blob/master/results/performance.csv and the correlations with salary in https://github.com/TaddyLab/hockey/blob/master/results/salarycorr.csv. I think that performance.R is pretty easy to parse but please hit me up with any questions here.

The currently checked-in results study the goals data using the up-to-201314 data that I grabbed from bobby's directory.

The analysis all uses standard lasso L1 penalization, and AICc selection. @rbgramacy : I think that perhaps we can stick with this to make the current paper simple? We then differ from the older analysis in a few ways:

Finally, we compare to salary and find that the correlation with PPM increases after the 2005 lockout.

Purpose of this issue is to decide what else we need, if anything. Adding in post-season? Putting salary straight into the original regression? Considering other terms like corsi in the regression?

rbgramacy commented 9 years ago

I guess I should try owning this collaborative environment.

Responding to Matt's query for me, yes I think we should keep this paper simple. Although I do think we should talk about fully Bayesian calculation in the introduction/review. I'll be working on that over the next several days and try to have something by the end of next week.

I thought we were already including the post-season. At least, the data and scripts we used for the blog used all goals data except penalty shots. We also included an indicator for post-season in the model.

Here is a question. The editors provided us with a simple latex template from CRC. I'm working on getting that going today. Should we create a new tex directory on the repo? I'll leave that up to you, Matt, because its your space.

-B

On Wed, Sep 2, 2015 at 7:04 PM mataddy notifications@github.com wrote:

hi @rbgramacy https://github.com/rbgramacy and @sentian https://github.com/sentian,

I've updated this repo (note that it has moved from mataddy to taddylab, which makes it easier for me to manage collaboration).

The new reorg has the old material in code/blog, and the new code files design.R will build you either the regression info for goals or for shots. performance.R then fits the results and compares to salary; the relevant output tables are the full table of betas, PPM, and PM in https://github.com/TaddyLab/hockey/blob/master/results/performance.csv and the correlations with salary in https://github.com/TaddyLab/hockey/blob/master/results/salarycorr.csv. I think that performance.R is pretty easy to parse but please hit me up with any questions here.

The currently checked-in results study the goals data using the up-to-201314 data that I grabbed from bobby's directory.

The analysis all uses standard lasso L1 penalization, and AICc selection. @rbgramacy https://github.com/rbgramacy : I think that perhaps we can stick with this to make the current paper simple? We then differ from the older analysis in a few ways:

  • we control for team-season specific effects as well as special teams effects. these effects are not penalized.
  • I include player*season interactions, so that there is potentially a different beta for each player for each season.
  • we do the translation from betas to partial plus minus.

Finally, we compare to salary and find that the correlation with PPM increases after the 2005 lockout.

Purpose of this issue is to decide what else we need, if anything. Adding in post-season? Putting salary straight into the original regression? Considering other terms like corsi in the regression?

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/2.

mataddy commented 9 years ago

I think you are correct on the post season, but will double check. I could also interact post with season to get playoff and regular season betas for each year.

New tex directory sounds like a great idea. I usually do something like crcpaper/hockey.tex and put pics in crcpaper/graphs, but am not at all picky about how you want to set it up.

rbgramacy commented 9 years ago

crcpaper sounds good. Would that be at the top level, or inside a tex directory which is at the top level? I don't have figs just yet, that's all you.

-B

Robert B. Gramacy bobby.gramacy.com Tel: +1 773 702 0739 Booth School of Business The University of Chicago 5807 S. Woodlawn Ave. Chicago, IL 60637, USA

On 3 September 2015 at 11:34, mataddy notifications@github.com wrote:

I think you are correct on the post season, but will double check. I could also interact post with season to get playoff and regular season betas for each year.

New tex directory sounds like a great idea. I usually do something like crcpaper/hockey.tex and put pics in crcpaper/graphs, but am not at all picky about how you want to set it up.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/2#issuecomment-137487302.

mataddy commented 9 years ago

Ooh, good point. Maybe something like docs/crcpaper

rbgramacy commented 9 years ago

Hey,

I was able to clone the hockey repo, but it seems I don't have push permissions. Once you give me those I'll have the new docs directory up there.

I've got the style files all worked out and roughed in an abstract, etc. That might be all I have time for today because I'm home with Natalia. I should be able to get a lot done tomorrow. I'm happy to have you scrap everything for new text, but hold off until I've blurted it all out.

-B

Robert B. Gramacy bobby.gramacy.com Tel: +1 773 702 0739 Booth School of Business The University of Chicago 5807 S. Woodlawn Ave. Chicago, IL 60637, USA

On 3 September 2015 at 12:06, mataddy notifications@github.com wrote:

Ooh, good point. Maybe something like docs/crcpaper

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/2#issuecomment-137497168.

mataddy commented 9 years ago

I'll make sure you have push permission. and no worries I'm in no rush to write anything. I'm doing another analysis push today and then will need to be on other projects for a few days.

mataddy commented 9 years ago

should be good now; you had permissions but I had forgotten to add the hockey repo to your list

mataddy commented 9 years ago

OK, so it is confirmed that the data include postseason.

When I allowed for different player effects in post and regular season, I noticed that only a half dozen goalies popped up with significant betas. This reminded me of results before we controlled for team and season, so I re-ran with team-season-postseason interactions (i.e. different team-season effects in regular season and playoffs). Then all of the post-season beta effects disappeared, so the conclusion is that we see no measurable evidence that players perform differently in regular season and playoffs after controlling for overall change in the quality of play.

The new performance tables include both regular and playoff betas (always the same, as described above), PM, and PPM.

I've also broken the salary comparison into a separate file, salary.R. One thing that we need to resolve: there are a small number of player-seasons with NA salaries... about 700/10k. I can only fix a few with name correction (this is using the already name-corrected file from sen). There is a larger number of players with zero salaries... about 1500. this is more worrisome to me, since I expect it could mean that some of their salary was accounted onto another year. I'll create an issue for @sentian to try and explain what is going on.

mataddy commented 9 years ago

Results are in. I've also added the shots analysis. Some interesting points of comparison between shot-based and goal-based metrics

rbgramacy commented 9 years ago

Awesome stuff. It sounds like the second half of the chapter is going to write itself.

-B

Robert B. Gramacy bobby.gramacy.com Tel: +1 773 702 0739 Booth School of Business The University of Chicago 5807 S. Woodlawn Ave. Chicago, IL 60637, USA

On 3 September 2015 at 15:26, mataddy notifications@github.com wrote:

Results are in. I've also added the shots analysis. Some interesting points of comparison between shot-based and goal-based metrics

  • the goal betas and ppms are much more correlated with salary than those based on shots. For standard PM, the results are similar whether based on shots or goals.
  • for partial PM: ovechkin is king if you measure shots. crosby is king if you measure goals.
  • with the additional data for shots (now n>700k) we get a bunch of nonzero post-season beta deltas (measurable change between regular and playoffs). I've checked in the relevant table.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/2#issuecomment-137548547.

mataddy commented 9 years ago

cool. I'm going to close this issue and we can re-open for new analysis once you have set the stage or if we think of anything new to consider.