dsgkirkby / CanucksArmy

Tools used at CanucksArmy
http://canucksarmy.com
2 stars 0 forks source link

Request to Change goalie scraper to HockeyDB #17

Closed VivalaSedinery closed 6 years ago

VivalaSedinery commented 7 years ago

I'd like to change the goalie scraper from Elite Prospects to HockeyDB. This probably won't be easy or convenient, but I believe it will be necessary to the success of the project.

EP stats on goalies lack depth. HockeyDB is much better.

A major issue with EP, which is the real dealbreaker: there doesn't appear to be any list of all-time stats. When it comes to skaters, I know longer use the all-time scraper, I scrape the individual NHL seasons and add them together - increases versatility. Unfortunately this doesn't appear to be an option with goalies. You obviously can't save percentage or GAA together (the only stats other than GP that they provide) and you can't reliably average them without knowing the shots against. This means that the only that can be added is the Games Played. So we'd be able to tell whether they've played a certain threshold of games, but cannot measure any degree of success beyond that. EP should be tossed out for goalies.

On to HockeyDB: There doesn't appear to be a league wide list, which is problematic. The best method that I can think of is going through each league-season, then into each team and finding the goalie table at the bottom.

League-season for NHL in 2015-16: http://www.hockeydb.com/ihdb/stats/leagues/seasons/nhl19272016.html

Appears to change easily with the 2016 in the url

Vancouver stats from 2015-16, goalie table at the bottom http://www.hockeydb.com/ihdb/stats/leagues/seasons/teams/0000392016.html

image

Tougher process would be pulling age and size data in batches from here.. but probably necessary since names from EP are often different.

Bio data is provided on individual pages.. definitely convenient than EP, but don't know another way around this at this point image

Major benefit of HockeyDB when it does work.. can add the individual stats together to get accurate career save percentages, etc.. can even weight them based on workload. That way we can actually make use of the data, unlike EP's limited opportunities.

Officially open for discussion.

dsgkirkby commented 7 years ago

Hmm, it's unfortunate that EP doesn't give us more for goalies.

With GP, we could estimate the overall SV% and GAA. Over time, the number of shots faced and the number of minutes played per game should be roughly the same. So estimates could be achieved like so:

SV%est = (GP1 * SV%1 + GP2 * SV%2 + ...) / (GP1 + GP2 + ...)

I think as a first attempt it's worth running with that. We can see how inaccurate things are and assess whether getting more accuracy is worth it.

VivalaSedinery commented 7 years ago

I've been pulling a couple of league-seasons manually (just 2 WHL seasons so far, because, as you'd guess, it's time consuming) just to get an idea of how I'm going to structure things. image Under the blue header is all the data that I've gotten out of the HockeyDB team stat pages. A major benefit of having the TOI and shots is that I'm able to create Point Shares, which is a big advancement. Totaling the shot numbers per hour for each team, I can then use them in the player pGPS to improve defensive point shares for skaters as well. This is pretty pivotal stuff.

I know you weren't exactly stoked on the idea of creating a new scraper for HockeyDB. In the original issue post above, I noted how complicated getting the bio data is, which I'm sure isn't helping your optimism. As a compromise, I think I can just use the Elite Prospects bio data. As mentioned, there are differences in names once in a while (which will cause the two chunks of data to not match), but I can just manually change them for now to match up.

At this point, it's only been about 4 or 5 names per league season, which is pretty manageable. It'll get a bit more complicated when I move on to European teams (eg, HockeyDB will use the North American 'chev' while Elite Prospects will use the European 'chyov' in names like Sergachev, Shipachev), but I'll see how that goes when I get to it.

Anyway, I hope just doing the stats makes it a little less foreboding.

dsgkirkby commented 7 years ago

Right now this is looking rough. Hockeydb gives me 403 errors (denied permission) when I try to scrape.

VivalaSedinery commented 6 years ago

Going to close this because it looks pretty dead for the time being.