Open JB13 opened 6 months ago
Looking into it a bit, it looks like the "eventID" values in the json are no longer guaranteed to be in order (See snippet of json below).
@JB13 Thanks for the heads up. That's a bummer.
Looking at the JSON your provided, I wonder what "sortOrder"
represents. That's seems to be increasing for each subsequent event. That might work, though I have no idea what the actual value represents.
I was pulling down data (using both scrape_seasons and scrape_games) and I noticed that ~30% of shots either had no xC/yC data, or just had it listed at one of the bullet points:
Looking into it a bit, it looks like the "eventID" values in the json are no longer guaranteed to be in order (See snippet of json below). In json_pbp.py, I removed the sorted_events logic, and get data in the "right" order:
Not sorting seems to work mostly? Still need to investigate cases where html event length != json event length. Sorting by seconds_elapsed doesn't work great for stoppages, then faceoffs at the same time point.
I'll might have time to try to find a more elegant fix to this (and maybe adding a test that grabs a couple plays from a game to confirm it's being parsed correctly in the future). But wanted to write this down/make note of it in case anyone else is looking at it.