Closed mataddy closed 9 years ago
@sentian: since you've done much more with shots... if you have a chance, can you confirm that the 'shot' event that we are using in design.R conforms with the fenwick definition listed above? ie does it include both on-goal and missed shots? And also what is the code for a blocked shot?
Will do tomorrow. I've been quite busy today.
No rush at all! Also, feel free to add whatever you want to the analysis. Your and my job is to get creative with analysis while bobby writes. For example, I think you did a bunch of salary comparisons that I haven't replicated.
@mataddy , @rbgramacy: do we have a document telling the event types 'etype' in the 'gamerec' files? I've found etype has 'MISS', 'BLOCK' besides 'SHOT' and 'GOAL'. To check that, I actually compared the record with a game on Youtube, which makes me believe 'BLOCK' is the blocked shots in Corsi and 'SHOT' is the missed shots. However, I'm not too sure what 'MISS' is.
An example is as follows.
The game record shows that in period 1, there's a 'SHOT' by Malkin at 19:18, 'MISS' by Crosby at 17:12, 'BLOCK' (shot by Letang) at 14:55.
The video I've found is here. https://www.youtube.com/watch?v=tQ32LL7UT7Y
If you guys agree with this, I'm adding in 'blocked shots' in the design.R, and do some analysis using Corsi, maybe?
thanks, nice work. From the video it appears that SHOT is a 'shot on goal' and MISS is a 'missed shot'. Notice that on crosbie's 'MISS' he is knocked off balance and hits the boards far from the goal.
So then, instead of my 'SHOTS' flag we should have something like RESPONSE equal to either 'goal' for just etype=="GOAL"; or 'fenwick' for etype %in% c("GOAL","SHOT","MISS"); or 'corsi' for etype %in% c("GOAL","SHOT","MISS","BLOCK"). Then we can replace the current -shot.csv results with both -fenwick and *-corsi.
Note that in the output (regardless of metric) performance-*.csv file I've added 'fp' which is the 'for percentage': points.for/(points.for + points.away). This is the way that corsi and fenwick tend to be reported, as opposed to the usual PM points-for - points.away. As a probabalistic version of this I added 'prob' based on our betas, which is just prob = 1/(1+exp(-beta)).
oops, didn't mean to close the issue.
Also, Sen FYI both a 'shot on goal' and a 'missed shot' are 'misses' in the sense that no-one scores, but the 'shot on goal' required a stop from the goalie while the 'missed shot' had no chance.
I forgot I should write it here instead of sending email. @mataddy Can you please send me the 1314 game record data ‘20132014-*-gamerec.txt’. I'm just aware of not having them. I think it shall be around 16 mega bytes. Probably a dropbox link?
cool, will do. I have the full directory with all games back to 02/03 tar'd in dropbox and can share that; ~1GB so not too big, then we all have the same data.
That's perfect, thx~
just sent; let me know if you don't see it.
hmm. I think I did not receive it. Did you send to my gmail?
I got the game records. Can you send me the roster data as well? I forgot to mention it. Apologize...
I've just finished running design code for CORSI and FENWICK. My laptop has 8 cores and it took me like 8 hours to run each. They are both very large. The nhldesgin-.rda files are around 450mb each.
Several things I want to check with @mataddy :
entry <- goal$season[XP@i[tail(XP@p,-1)+1]+1]
firstyear <- as.numeric(substr(entry,1,4))
Should it be 'head' instead of 'tail'? But they shouldn't cause trouble since I have not seen them in the performance/salary codes.
Hi sen, I think your correct on both of those; these are legacy bugs from the old buildgoals. But since we use none of them downstream then let's just delete them (I recalculate plus minus correctly in performance.R).
can you also update performance.R and run for these new responses? Then we'll have results/performance_corsi.csv, etc
Sure. That's what I'm doing right now.
I've uploaded the results. Maybe we should delete those results in SHOTS? @mataddy I've also shared the nhldesign files for CORSI and FENWICK in case you want to do some other stuffs.
CORSI data: n>1.3 million, FENWICK data: n>1 million A few findings:
super nice work sen; thanks. I'll delete the 'shots' results (and we've recorded here that results are close to those from fenwick) and close this issue.
I'll also create a new issue for an initial writeup of these models and results. I'm going to assign it to you but it is my responsibility too. I won't be able to devote until mid-next week on this, so anything you can get in there yourself will give me a good headstart for the final writeup.
@rbgramacy and @sentian, some food for thought on how we sell this.
from the article at http://www.secondcityhockey.com/2013/12/4/5167404/nhl-stats-made-simple-part-1-corsi-fenwick these are just based on shots. the difference is that corsi includes blocked shots.
so when we are doing our regression with shots, is the result like a "regression adjusted fenwick"? And could we do the same thing for corsi by adding blocked shots?