TaddyLab / hockey

Chicago Hockey Analytics
7 stars 2 forks source link

add results writeup to tex #5

Closed mataddy closed 9 years ago

mataddy commented 9 years ago

@sentian and I will fill in the application and results section using the goals, corsi, and fenwick analysis executed up through issue #4. A candidate layout would be

I won't be able to devote any time until mid-next week on this, so if @sentian can get a rough draft in there then I'll finish up and bring the paper on home.

rbgramacy commented 9 years ago

Thanks guys. I have a couple days of writing left but feel free to dump things into the second half of the hockey.tex doc.

B On Mon, Sep 7, 2015 at 12:50 mataddy notifications@github.com wrote:

@sentian https://github.com/sentian and I will fill in the application and results section using the goals, corsi, and fenwick analysis executed up through issue #4 https://github.com/TaddyLab/hockey/issues/4. A candidate layout would be

  • performance for goals, controlling for team-season-playoff effects in addition to the special teams stuff we controlled for in the previous paper and translating to partial-plus minus. This might need to be split into multiple subsections.
  • extending these same ideas to corsi and fenwick 'response' variables.
  • a study of correlation between the performance metrics and salary.

I won't be able to devote any time until mid-next week on this, so if @sentian https://github.com/sentian can get a rough draft in there then I'll finish up and bring the paper on home.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5.

sentian commented 9 years ago

That sounds like a good plan. I will have time later this week for the writing.

rbgramacy commented 9 years ago

Both,

I'm done with the initial text for now. I'm not married to any of it, but there are some things I've tried to keep in mind when writing that I hope you agree represent a good approach. There is very little math, even less than the original JQAS paper. I want to clarify that we are proposing a framework, and that the JQAS work was a simple first example. In fact, what I present early int he chapter is even simpler because I drop the "teams" aspect. @taddy will have to fill in some details about gamlr.

As I got to the end of my writing I started to become less certain as to how the transition would go because I wasn't sure exactly what results you would insert in the end. My vision is that we'd have a section which describes enhancements to the simple player-only version described at the beginning. And then we'd have a section describing the results under that new formulation. We also need a description of how we convert betas into partial plus-minuses.

I need to take a little break from this to work on other things. I trust it is in good hands.

-B

mataddy commented 9 years ago

awesome, thanks @rbgramacy!

sentian commented 9 years ago

@mataddy Matt, I think I can finish writing all I can think of by the end of tomorrow. I will then pass it on to you.

mataddy commented 9 years ago

hi @sentian any updates? I'll have time later this week to do my writing.

sentian commented 9 years ago

Sorry, @mataddy . I've been so slow at writing. But I should be able to finish by today or tomorrow.

Several things to clarify:

  1. Bobby mentioned (and in the hockey paper) to use a combination of L2 and L1 penalization. We only use L1 on players and set all other free in our analysis.
  2. In the hockey paper, you guys used a hyperprior distribution on the Laplace parameters $\lambda_j$. While in the analysis, we just set all $\lambda_j$ to be a '$\lambda$', and tried many different $\lambda$'s where we used AICc to select the 'best' one.

So how to make the story unified? Also, without penalizing the teams, the team-player model will not give that many sparsity as you guys discussed in the hockey paper.

In my writing, I start from comparing team-player model with player-only model. Then I add the special team effects, and do another comparison between team-special team-model and team-player model. I illustrate several player's performance by comparing PPM with PM. I further add the season and playoff interactions to check the seasonal/playoff performance of player. Second part is using Corsi/Fenwick. And the last part is the salary analysis.

Any comments or advice?

mataddy commented 9 years ago

@sentian no problem at all.

1: we only use L1, there is no L2. if you write it as such i can go back and change whatever bobby has. 2: yes, ignore the previous stuff and you can just talk about it in this way: we fit the model over a grid of lambda values and choose the one that we expect to give the best out-of-sample prediction (AICc approximates cross validation). i can also fill this in.

Your layout sounds great! I'm looking forward to seeing how corsi and fenwick relate to the goals analysis and salary.

mataddy commented 9 years ago

@sentian any updates? don't worry about making it too polished.

sentian commented 9 years ago

I decide to upload what I've got so far. I've spent too much time on results in goals and was not able to dive deep in the following parts. So I'd rather leave CORSI/FENWICK and salary part blank for now. I would like to fill in more details this Friday/weekend if our time is limited.

mataddy commented 9 years ago

thanks @sentian! sure: please add whatever you can over the weekend. I'll start editing the first bit but we can do this in parallel.

rbgramacy commented 9 years ago

The deadline is Wednesday so we do need to sprint to the finish on this.

B On Thu, Sep 24, 2015 at 05:57 mataddy notifications@github.com wrote:

thanks @sentian! sure: please add whatever you can over the weekend. I'll start editing the first bit but we can do this in parallel.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-142893422.

mataddy commented 9 years ago

@sentian I just took a look through; nice work so far! If you can get the rest of your stuff in over the weekend then I'll block off monday (and can spill over into tuesday) to finish off the rest and tie it all together. Given the time crunch, perhaps just concentrate on getting the results (and pictures) in there and I'll finish the rest.

On the pictures:

Also, for salary comparison (if you have time) it would be fantastic to see somehow the correlation between salary and PPM or partial corsi/fenwick FP as line plots with the labor disputes marked with vertical lines. I can fill in the rest of the story.

sentian commented 9 years ago

Great! I will fill in the rest over this weekend. Meanwhile, @mataddy I will leave the 'gamlr' and the establishment parts to you.

sentian commented 9 years ago

@mataddy I've emailed you a plot for PPM vs salary. Is it close to your expectation?

sentian commented 9 years ago

I uploaded what I've got so far. I only wrote some guidelines for Corsi/Fenwick and salary parts. I will leave the story to @mataddy . But let me know if you want me to do some other things.

You can find code for the plots in R files with '_sen'. Some of the figures need some slight changes, like the vertical lines block the legend box. I can do that once you decide the figures to keep.

rbgramacy commented 9 years ago

Just fyi guys: I'm at UF all day on Wednesday the 30th, which is the day we are due. So if you need anything from me, do let me know soon. Otherwise, I should be able to do a once-over in the evening that day before submitting.

-B

On Mon, Sep 28, 2015 at 5:33 AM Sen Tian notifications@github.com wrote:

I uploaded what I've got so far. I only wrote some guidelines for Corsi/Fenwick and salary parts. I will leave the story to @mataddy https://github.com/mataddy . But let me know if you want me to do some other things.

You can find code for the plots in R files with '_sen'. Some of the figures need some slight changes, like the vertical lines block the legend box.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-143690890.

mataddy commented 9 years ago

cool, thanks guys. I'm going to hit this today and hopefully send to bobby tonight.

is there a page number target? For speed reasons I'm going to cut/add/alter pretty ruthlessly today, but we can re-add material if desired.

rbgramacy commented 9 years ago

I don't think there is a page quota/limit. But I do think we benefit from being a little on the verbose side. I think many of our readers from the Sports Analytics community think we're doing voodoo, and we need some laymans explanation to soften things a bit.

-B

On Mon, Sep 28, 2015 at 11:33 AM mataddy notifications@github.com wrote:

cool, thanks guys. I'm going to hit this today and hopefully send to bobby tonight.

is there a page number target? For speed reasons I'm going to cut/add/alter pretty ruthlessly today, but we can re-add material if desired.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-143780032.

mataddy commented 9 years ago

cool, thanks. My program right now is to de-statsify a lot of what is in there so I think we're on the same page. My goal is to make it readable for my MBAs.

mataddy commented 9 years ago

as was probably predictable, the stuff in the doc all looks good but its going to take me longer than a day to reconcile the setup with the actual analysis. I'll have it to bobby for your wednesday evening once-over at latest, but hopefully tomorrow.

rbgramacy commented 9 years ago

No problem at all. Sounds good.

Cheers, Bobby

On Mon, Sep 28, 2015 at 6:06 PM mataddy notifications@github.com wrote:

as was probably predictable, the stuff in the doc all looks good but its going to take me longer than a day to reconcile the setup with the actual analysis. I'll have it to bobby for your wednesday evening once-over at latest, but hopefully tomorrow.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-143887359.

mataddy commented 9 years ago

Hey @rbgramacy I've now done everything up through section 1.3 ; this is all of the pre-analysis stuff, so that it now matches up with the (non-bayes, L1 only) analysis sen and I have done. I tried to make it as close to MBA level as possible. If you want to go through it I won't be touching it again through 1.3. I'm out much of tomorrow but will try to put some polish on the results; they won't be what we want to present in the final but can serve as a good starting point for our next revision.

rbgramacy commented 9 years ago

p.s., I like how you always switch my author orders around. Do you prefer alphabetical? You guys (esp Sen) did more than me on this one for sure.

-B

Robert B. Gramacy bobby.gramacy.com Tel: +1 773 702 0739 Booth School of Business The University of Chicago 5807 S. Woodlawn Ave. Chicago, IL 60637, USA

On 29 September 2015 at 09:15, Robert B. Gramacy <rbgramacy@chicagobooth.edu

wrote:

I'll have a look at all through 1.3 this morning, and then wait to hear from you.

-B

Robert B. Gramacy bobby.gramacy.com Tel: +1 773 702 0739 Booth School of Business The University of Chicago 5807 S. Woodlawn Ave. Chicago, IL 60637, USA

On 29 September 2015 at 01:08, mataddy notifications@github.com wrote:

Hey @rbgramacy https://github.com/rbgramacy I've now done everything up through section 1.3 ; this is all of the pre-analysis stuff, so that it now matches up with the (non-bayes, L1 only) analysis sen and I have done. I tried to make it as close to MBA level as possible. If you want to go through it I won't be touching it again. I'm out much of tomorrow but will try to put some polish on the results; they won't be what we want to present in the final but can serve as a good starting point for our next revision.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-143946119.

rbgramacy commented 9 years ago

I'll have a look at all through 1.3 this morning, and then wait to hear from you.

-B

Robert B. Gramacy bobby.gramacy.com Tel: +1 773 702 0739 Booth School of Business The University of Chicago 5807 S. Woodlawn Ave. Chicago, IL 60637, USA

On 29 September 2015 at 01:08, mataddy notifications@github.com wrote:

Hey @rbgramacy https://github.com/rbgramacy I've now done everything up through section 1.3 ; this is all of the pre-analysis stuff, so that it now matches up with the (non-bayes, L1 only) analysis sen and I have done. I tried to make it as close to MBA level as possible. If you want to go through it I won't be touching it again. I'm out much of tomorrow but will try to put some polish on the results; they won't be what we want to present in the final but can serve as a good starting point for our next revision.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-143946119.

mataddy commented 9 years ago

Ha! I did it without even thinking this time. But in any case I feel alphabetical is good here; you and I are a team on hockey so it doesn't make sense to flip author orders from paper to paper. besides, division of labor has been fairly even.

mataddy commented 9 years ago

I'm going to try and crank out the results this morning; I'll avoid running any new code or creating new plots, and can sign-post what I'd change if @sentian has time to update plots tomorrow.

rbgramacy commented 9 years ago

I can see that you've overhauled the first part of the paper. It looks really nice. I'm going to presume my initial text inspired you! I'm almost done with my (super minor) changes up thru 1.3.

-B

Robert B. Gramacy bobby.gramacy.com Tel: +1 773 702 0739 Booth School of Business The University of Chicago 5807 S. Woodlawn Ave. Chicago, IL 60637, USA

On 29 September 2015 at 09:40, mataddy notifications@github.com wrote:

Ha! I did it without even thinking this time. But in any case I feel alphabetical is good here; you and I are a team on hockey so it doesn't make sense to flip author orders from paper to paper. besides, division of labor has been fairly even.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144062245.

rbgramacy commented 9 years ago

You might not be surprised to find that I'd like some of the text on the fully Bayesian analysis, with references to the reglogit papers/package, to be put back in. I'm fine if its in the conclusion section. Of course its good that gamlr is quick, but the full posterior is still tractable with a decent machine.

-B

Robert B. Gramacy bobby.gramacy.com Tel: +1 773 702 0739 Booth School of Business The University of Chicago 5807 S. Woodlawn Ave. Chicago, IL 60637, USA

On 29 September 2015 at 09:41, mataddy notifications@github.com wrote:

I'm going to try and crank out the results this morning; I'll avoid running any new code or creating new plots, and can sign-post what I'd change if @sentian https://github.com/sentian has time to update plots tomorrow.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144062860.

mataddy commented 9 years ago

oh yes: I was actually going to ask you to put this in the conclusion along with whatever other extensions you had in mind (in your draft you mentioned trees and ml stuff; I don't know what you're thinking of here but it sounds cool so go to town). I think it was distracting to the non-stats reader when coming before our results, but its natural at the end.

rbgramacy commented 9 years ago

Will do later today. For the ML stuff I was mostly thinking Random Forests, since I think we both have heard separately from hockey stats gurus that folks in the are have been tinkering with that in a similar context. I can wrap the two things together and perhaps suggest that other ML approaches might be viable with heavy computation.

-B

Robert B. Gramacy bobby.gramacy.com Tel: +1 773 702 0739 Booth School of Business The University of Chicago 5807 S. Woodlawn Ave. Chicago, IL 60637, USA

On 29 September 2015 at 09:59, mataddy notifications@github.com wrote:

oh yes: I was actually going to ask you to put this in the conclusion along with whatever other extensions you had in mind (in your draft you mentioned trees and ml stuff; I don't know what you're thinking of here but it sounds cool so go to town). I think it was distracting to the non-stats reader when coming before our results, but its natural at the end.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144069325.

mataddy commented 9 years ago

perfect!

You could even say that a random forest is approximating a posterior over trees (http://arxiv.org/pdf/1502.02312v2), if you want the conclusion to be along the line of 'fully bayesian extensions'. then RFs and reglogit fit together under the same theme.

ps i minimized bayes in the setup because i don't want the reader to think they need to understand bayesian stats to read the results; there's nothing really bayesian about gamlr.

sentian commented 9 years ago

@mataddy Let me know how you want those figures to be like. I will do the adjustments later tonight or tomorrow. If you think new graphs are necessary, let me know as well.

Btw, I'm absolutely fine with the alphabetical order.

rbgramacy commented 9 years ago

Hey, I just roughed in a conclusion section. Will look back at it later this afternoon.

-B

Robert B. Gramacy bobby.gramacy.com Tel: +1 773 702 0739 Booth School of Business The University of Chicago 5807 S. Woodlawn Ave. Chicago, IL 60637, USA

On 29 September 2015 at 10:30, Sen Tian notifications@github.com wrote:

@mataddy https://github.com/mataddy Let me know how you want those figures to be like. I will do the adjustments later tonight or tomorrow. If you think new graphs are necessary, let me know as well.

Btw, I'm absolutely fine with the alphabetical order.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144077985.

mataddy commented 9 years ago

Hi all,

I've completed almost everything except for a list of plotting and table to-dos for @sentian; everything is in red in the final two sections. If you add the requested salary plots and tables then i can figure out the story to go with it. Please let me know an ETA so I can plan (or if you think there is too much to do, let me know and I can pick off some of the tasks).

@rbgramacy I've updated a bunch of text and equations throughout as I went through the results. I also touched the conclusion a bit.

I'm off for rest of today but will look at this tonight.

sentian commented 9 years ago

@mataddy I've briefly read through the to-do list. I will try to send you everything by tonight. You can then fill in the story tomorrow.

Btw, I read a lot of hockey news these days, but still don't know Canadians like Ovechkin over Crosby!

mataddy commented 9 years ago

awesome, thanks!

they like Crosby better and think ovechkin takes lots of shots but doesn't win games, so they'll be into our goals based analysis;-)

rbgramacy commented 9 years ago

Um, he also usually leads the league in goals. His problem is at the other end of the ice. On Tue, Sep 29, 2015 at 15:04 mataddy notifications@github.com wrote:

awesome, thanks!

they like Crosby better and think ovechkin takes lots of shots but doesn't win games, so they'll be into our goals based analysis;-)

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144158045.

mataddy commented 9 years ago

good point! we should add this to the discussion

rbgramacy commented 9 years ago

I'm going over @mataddy's recent changes now. -B

On Tue, Sep 29, 2015 at 3:32 PM mataddy notifications@github.com wrote:

good point! we should add this to the discussion

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144164988.

rbgramacy commented 9 years ago

BTW, I've seen both p and n_p being used for the number of players. Any preference (I like the latter)?

On Tue, Sep 29, 2015 at 3:33 PM Robert B. Gramacy < rbgramacy@chicagobooth.edu> wrote:

I'm going over @mataddy's recent changes now. -B

On Tue, Sep 29, 2015 at 3:32 PM mataddy notifications@github.com wrote:

good point! we should add this to the discussion

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144164988.

rbgramacy commented 9 years ago

@mataddy, I know you said you were done for the day so feel free to leave this 'till tomorrow.

I've added a couple of red notes for you in 1.4 and 1.5 about the PPM. I think its not quite right to say that PM and PPM account for relative time-on-ice. It only does that by loosening the maximum positive or negative value when players are involved in more goals. One could even conclude that, by your PPM post-processing, sort of double-penalty (which was not statistically inferred) on players involved in few goals. Chances are their coefficients are nearly zero anyways, and now you're making them map to smaller values on the PM scale.

I think we should add some qualifying statements, especially when we interpret results where the ordering of beta values is different from PPMs, like for Marian Hossa. I take those with a grain of salt. But I'm not sure what to put yet. We need something that will help contrast with the comments that I put in the discussion section that highlight the time-on-ice drawback of such approaches.

-B

On Tue, Sep 29, 2015 at 3:36 PM Robert B. Gramacy < rbgramacy@chicagobooth.edu> wrote:

BTW, I've seen both p and n_p being used for the number of players. Any preference (I like the latter)?

On Tue, Sep 29, 2015 at 3:33 PM Robert B. Gramacy < rbgramacy@chicagobooth.edu> wrote:

I'm going over @mataddy's recent changes now. -B

On Tue, Sep 29, 2015 at 3:32 PM mataddy notifications@github.com wrote:

good point! we should add this to the discussion

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144164988.

mataddy commented 9 years ago

cool, I think I follow the first bit. both account for number of goals on ice for, which is only a surrogate to time on ice. is that all you're saying? I don't get the double penalty bit, except to say that we shrink down to zero for those on ice for few goals (and these guys would have a lower pm anyways).

in any case I wasn't planning to touch anything but the results tomorrow to incorporate sen stuff. if you can just add whatever qualifications you want then we can discuss on revision.

rbgramacy commented 9 years ago

Saying that someone has a PM of +5 tells you nothing about their icetime, except that they were involved in at least five goals. Conversely, if I tell you that a player was on the ice for 15 goals, you know there PM is bounded between [-15, +15], but nothing more. I just think that's a weak surrogate for ice time. Therefore I think its a stretch to say that PM and PPM account for ice time.

On Tue, Sep 29, 2015 at 4:04 PM mataddy notifications@github.com wrote:

cool, I think I follow the first bit. both account for number of goals on ice for, which is only a surrogate to time on ice. is that all you're saying? I don't get the double penalty bit, except to say that we shrink down to zero for those on ice for few goals (and these guys would have a lower pm anyways).

in any case I wasn't planning to touch anything but the results tomorrow to incorporate sen stuff. if you can just add whatever qualifications you want then we can discuss on revision.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144175015.

mataddy commented 9 years ago

yes, agree. but a ppm of 40 says that you are worth many goals while a beta of 2 is great but doesn't mean as much if you don't play that much. I see your point about how it'd be hard to get a high beta if you are not on the ice for many goals. but also line mates etc affect this, and I think the issues are slightly separable. whatever u say is fine by me. On Sep 29, 2015 3:08 PM, "Robert B. Gramacy" notifications@github.com wrote:

Saying that someone has a PM of +5 tells you nothing about their icetime, except that they were involved in at least five goals. Conversely, if I tell you that a player was on the ice for 15 goals, you know there PM is bounded between [-15, +15], but nothing more. I just think that's a weak surrogate for ice time. Therefore I think its a stretch to say that PM and PPM account for ice time.

On Tue, Sep 29, 2015 at 4:04 PM mataddy notifications@github.com wrote:

cool, I think I follow the first bit. both account for number of goals on ice for, which is only a surrogate to time on ice. is that all you're saying? I don't get the double penalty bit, except to say that we shrink down to zero for those on ice for few goals (and these guys would have a lower pm anyways).

in any case I wasn't planning to touch anything but the results tomorrow to incorporate sen stuff. if you can just add whatever qualifications you want then we can discuss on revision.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144175015.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144176721.

rbgramacy commented 9 years ago

Ok I'll think of something. It just needs to be worded more carefully, something you usually are more persnickety about than I am. Plus, it's your baby so I thought I'd give you first crack. On Tue, Sep 29, 2015 at 16:23 mataddy notifications@github.com wrote:

yes, agree. but a ppm of 40 says that you are worth many goals while a beta of 2 is great but doesn't mean as much if you don't play that much. I see your point about how it'd be hard to get a high beta if you are not on the ice for many goals. but also line mates etc affect this, and I think the issues are slightly separable. whatever u say is fine by me. On Sep 29, 2015 3:08 PM, "Robert B. Gramacy" notifications@github.com wrote:

Saying that someone has a PM of +5 tells you nothing about their icetime, except that they were involved in at least five goals. Conversely, if I tell you that a player was on the ice for 15 goals, you know there PM is bounded between [-15, +15], but nothing more. I just think that's a weak surrogate for ice time. Therefore I think its a stretch to say that PM and PPM account for ice time.

On Tue, Sep 29, 2015 at 4:04 PM mataddy notifications@github.com wrote:

cool, I think I follow the first bit. both account for number of goals on ice for, which is only a surrogate to time on ice. is that all you're saying? I don't get the double penalty bit, except to say that we shrink down to zero for those on ice for few goals (and these guys would have a lower pm anyways).

in any case I wasn't planning to touch anything but the results tomorrow to incorporate sen stuff. if you can just add whatever qualifications you want then we can discuss on revision.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144175015.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144176721.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144181125.

mataddy commented 9 years ago

cool thx. no persnickety on this I promise;-). my brain is too tapped to say it better than I already did. On Sep 29, 2015 3:31 PM, "Robert B. Gramacy" notifications@github.com wrote:

Ok I'll think of something. It just needs to be worded more carefully, something you usually are more persnickety about than I am. Plus, it's your baby so I thought I'd give you first crack. On Tue, Sep 29, 2015 at 16:23 mataddy notifications@github.com wrote:

yes, agree. but a ppm of 40 says that you are worth many goals while a beta of 2 is great but doesn't mean as much if you don't play that much. I see your point about how it'd be hard to get a high beta if you are not on the ice for many goals. but also line mates etc affect this, and I think the issues are slightly separable. whatever u say is fine by me. On Sep 29, 2015 3:08 PM, "Robert B. Gramacy" notifications@github.com wrote:

Saying that someone has a PM of +5 tells you nothing about their icetime, except that they were involved in at least five goals. Conversely, if I tell you that a player was on the ice for 15 goals, you know there PM is bounded between [-15, +15], but nothing more. I just think that's a weak surrogate for ice time. Therefore I think its a stretch to say that PM and PPM account for ice time.

On Tue, Sep 29, 2015 at 4:04 PM mataddy notifications@github.com wrote:

cool, I think I follow the first bit. both account for number of goals on ice for, which is only a surrogate to time on ice. is that all you're saying? I don't get the double penalty bit, except to say that we shrink down to zero for those on ice for few goals (and these guys would have a lower pm anyways).

in any case I wasn't planning to touch anything but the results tomorrow to incorporate sen stuff. if you can just add whatever qualifications you want then we can discuss on revision.

— Reply to this email directly or view it on GitHub <https://github.com/TaddyLab/hockey/issues/5#issuecomment-144175015 .

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144176721.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144181125.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144183122.

sentian commented 9 years ago

I was kind of having the same confusion as Bobby's when I did the analysis. That's why I put results in beta before moving on to ppm at the first place. And I explained that beta is more useful in spotting rising stars who did not play too many games. But I think Matt's explanation on PPM also makes a lot of sense now.

mataddy commented 9 years ago

I agree; these are all good points. perhaps to upweight back the importance of the beta statistics we can provide another smaller table that just shows the top players by their beta? Or even better, two tables side by side: one with top-all-time by beta, and the other with top by beta from 2013-2014. Then we should have a chance to tell the interesting tyler toffoli story. @sentian if I cut something like this out please feel free to add it back in, or to add the tables anew if you have time.

rbgramacy commented 9 years ago

Hey,

I took a stab at my PPM issues. I find the related passages less objectionable now. In particular, it bothered me that we were saying that, betas being equal, low ice-time players are less valuable than high ice-time ones. All else being constant that is not true, and is in fact the opposite of the "moneyball goal" which is why people want analytics in sports. Since the salary of the low ice-time ones are probably much lower, one could easily do the opposite and value them more not less -- they are clearly better value for money.

But thinking hard about the text made me realize what is really going on here. The problem is that we don't know the (posterior) variance of beta (we only have beta-hat), which is one of the things that has to be held constant for the above criticism to make sense. And the size of that variance is going to be directly related to "n_g_i", which in this case is the number of goals scored while player i when he is on the ice. The formula for PPM is making a crude adjustment that accounts for that variance, via n_g_i on a player-by-player basis. We have no reason to suppose that PPM is the optimal way to take that information (the unknown variance or its n_g_i surrogate) into account, and we don't know what criteria would be good for optimizing anyways. It is, however, attractive in that PPM is on the same scale as PM, which makes it much more interpretable than beta-hat. Whether ranking by PPM or by beta is better is anyone's guess.

(Of course, if ranking players is the goal then the fully Bayesian version offers the posterior distribution over ranks, and a matrix for comparing all pairs of players. You could specify an optimization criteria for settling on a single rank, and crank out the calculations if that's what you really want.)

I didn't put any of that into my changes in the text. Food for thought though.

-B

On Tue, Sep 29, 2015 at 8:25 PM mataddy notifications@github.com wrote:

I agree; these are all good points. perhaps to upweight back the importance of the beta statistics we can provide another smaller table that just shows the top players by their beta? Or even better, two tables side by side: one with top-all-time by beta, and the other with top by beta from 2013-2014. Then we should have a chance to tell the interesting tyler toffoli story. @sentian https://github.com/sentian if I cut something like this out please feel free to add it back in, or to add the tables anew if you have time.

— Reply to this email directly or view it on GitHub https://github.com/TaddyLab/hockey/issues/5#issuecomment-144227707.