TaddyLab / hockey

Chicago Hockey Analytics
7 stars 2 forks source link

missing salary #3

Closed mataddy closed 9 years ago

mataddy commented 9 years ago

@sentian there are a small number of player-seasons with NA salaries... about 700/10k. I can only fix a few with name correction (this is using the already name-corrected file nhlsalaries_bhz_nc.txt from your repo). There is a larger number of players with zero salaries... about 1500. this is more worrisome to me, since I expect it could mean that some of their salary was accounted onto another year.

Can you try to help us understand where the zero salaries are coming from?

sentian commented 9 years ago

Yes. When I cleaned the salary dataset, I found 1960/2475 players in the roster dataset have salary info in the bhz_nc salary dataset. I guess there will still be some name issues although I've already manually corrected around 85 names.

Also, I think the zero salaries may not be unreasonable, although there must be missing values on BlackHockeyZone. Correct me if I'm wrong! I've double checked several players. For example, a guys called 'ANTTI_MIETTINEN'. He starts playing from season 0506 and has 0 salaries for season 1314 and 1112. For season 1112, Wiki shows he's playing for Jets but BHZ does not have the salary data. However, for season 1314, he's in NLA instead of NHL.

I've cross compared using the following websites. BHZ: http://blackhawkzone.com/salaries/career.php?PlayerID=5804 Wiki: https://en.wikipedia.org/wiki/Antti_Miettinen

mataddy commented 9 years ago

I find that some are simply missing from BHZ. For example, Mats Zuccarello made 1.15 mil in 20132014: http://sports.yahoo.com/news/rangers-resign-lw-zuccarello-174017102--nhl.html http://blackhawkzone.com/salaries/career.php?PlayerID=11779

This is fine; we can't expect to have everything. From what you're saying and my investigation, it looks that the zeros should be also treated as missing.

For example Matt Moulson in 2010-2011 made 2.45 million according to wikipedia but we have him as a zero. Ryan Nugent-Hopkins made .925 mil in 2011-2012 but we also have him as zero. Crucially, for both these guys the other numbers that we have are correct -- thus this is not an issue of salary being moved onto other years.

So I'm going to close this issue and from now on we will treat zero salaries as missing