dsgkirkby / CanucksArmy

Tools used at CanucksArmy
http://canucksarmy.com
2 stars 0 forks source link

AHL Gamesheet scraping #6

Closed dsgkirkby closed 7 years ago

VivalaSedinery commented 7 years ago

Would be ideal to have this up and running fairly soonish, as the AHL season begins this weekend. Shouldn't be overly difficult. The schedule is available here.

This is an example of last year's game report. I believe the id at the end just increases in iteration starting from that first game. Idea is to pull information the same as in the OHL game scraper. ie, replace the numbers in the scoring section with name/number combos. OHL one was formatted as Firstname Lastname-##. On-ice portion was just separated by commas.

Other columns for each and every goal included Game ID, date, visiting team, home team, GF team, GA team, period, time, manpower. Manpower would be best denoted in #v# format, with the scoring team being the first number and the against team being the second. ie, regular play is 5v5, power play is 5v4 or 5v3, short handed goal is 4v5, empty net goal is 5v6, etc.

VivalaSedinery commented 7 years ago

The OHL scraper no longer seems to work. Passing season 58 (2015-16) still works, but passing 56 (2016-17) results in a large series of error messages. image

VivalaSedinery commented 7 years ago

Preferable order of completion for scrapers would be as follows: AHL NCAA OHL QMJHL WHL USHL ECHL

I would imagine that almost all of them should involve very similar processes. The game sheets themselves, once found, look pretty much identical for all leagues except the QMJHL. That may be a trickier job.

In the October 11th comment, I mentioned that the previous version denoted names as Firstname Lastname-##. I'm no longer sure if that's the best way of doing things. Sometimes players change number when they change teams, or even staying on the same team (see Michael Carcone on the Comets). Perhaps just Firstname.Lastname with a comma as the separator would be best.

VivalaSedinery commented 7 years ago

Just to make things easier:

QMJHL season schedule here. Both the Time and Links columns on this chart contain links to the Game Centre. While that itself is not useful for this particular exercise (though it may be useful for an expanded version in the future), the link contains the game number. That game number can be inserted in to this url: http://theqmjhl.ca/reports/games/25277/official to reach each individual game sheet. Like I said above, the QMJHL game sheets are a little different, but not as different as they were in previous years. Definitely workable.

OHL schedule is here. WHL schedule is here. Same situation as the QMJHL for both. Links in the chart contain game numbers which can be popped into url's that go directly to game sheets. OHL gamesheet WHL gamesheet The game sheets are different than they used to be, which is probably why the old scraper doesn't work for the new season.

NCAA may be tied for easiest with the AHL. NCAA schedule here. The game sheets are under a column headed 'Printable Game Sheet' and look very simple to navigate. Score!

AHL schedule here. Game sheets are accessed directly from here, in the second to last column. Image is a page with an R, contains the tool tip 'Game sheet'.

Godspeed my friend. You are doing the most heroic work for Canucks Army.

dsgkirkby commented 7 years ago

I'm unable to reproduce the issue with the ohl scraping.

VivalaSedinery commented 7 years ago

As mentioned in Slack, the OHL scraper worked fine tonight grabbing season 56 (2016-17) so no apparent issue there.

We talked about this quite a while ago, but it's not listed in here, so I'll make a note: I need other parts of the game sheet as well. Rosters in particular, as that's how I can tell who played in each game, but also penalties.

First penalties, because they're easy. Just a list almost exactly how it's presented on the game sheet, with a couple of minor additions/alterations. One, change the number to the player's name, matching the format used on the score sheet (Firstname Lastname-## as of now). Then insert on the left two columns, one for the game ID and one for the date (which again matches the score sheet)

image

VivalaSedinery commented 7 years ago

For the rosters, things get a little more complicated. What I really want is a game list that includes almost all of the other information, including goals, powerplays, shots, etc. and rosters.

So we start with the schedule of the given league, alter the game number on the left to the game ID. Keep the date of course. Alter the team names there to the actual team names (so they match the team names in the score sheet and so on). Rename the GF columns as Away GF and Home GF to avoid confusion. image After that, the real fun starts.

I'd like to add more columns based on information from within the game sheets. Including columns for home/away goals by each period, home/away shots by each period, home/away power play opportunities, power play goals. All of this is available at the bottom of the game sheet. image One note here: there isn't a box for shots per se, but in the goaltender section, it shows their shots against.

Finally, I'd like to add a column that shows the home roster and another that shows the away roster. It can be pulled from the rosters on the game sheet (see image in last comment) and concatenated into one cell, separated by commas, the same way the players are combined for on-ice plus/minus columns in the score sheet.

I do this in my rudimentary AHL game sheet file now, seen on the far right. image

Questions, comments, concerns? I know it's a lot, but it's all gonna work out real great in the end.

VivalaSedinery commented 7 years ago

So for the schedule, I guess something like this

image

VivalaSedinery commented 7 years ago

Side note, a lot of this information is available in the Game Centre. Would it be better to be grabbing info from here? There's a ton more available.

image

The rosters here also have more info.

image

Goalie information is also better.

image

Scoring section.

image

Play-by-play has useful info, like faceoff wins over whoever.

image

This is a major departure from the original ask, so you tell me, how easy would it be to pull from here as opposed to the standard game sheet?