BurntSushi / nflgame

An API to retrieve and read NFL Game Center JSON data. It can work with real-time data, which can be used for fantasy football.
http://pdoc.burntsushi.net/nflgame
The Unlicense
1.28k stars 413 forks source link

2012 Playoffs #21

Closed ochawkeye closed 11 years ago

ochawkeye commented 11 years ago

Got a very bad feeling this morning that I was gearing up for the playoffs and things were going to puke on Saturday. Can anyone calm my fears that things are going to work this weekend?

games = nflgame.games(2012, week=1, kind='POST')

Will that line work just like this one does I hope?

games = nflgame.games(2011, week=1, kind='POST')
ochawkeye commented 11 years ago

Does anyone check this anymore? I see this weekend's games have been assigned JSON names (2013010500, 2013010501, 2013010600, 2013010601) but I'm not familiar with how nflgame knows that those exist. Does the schedule need to be updated or is it dynamically aware of new weeks?

I'm just trying to avoid a surprise on Saturday afternoon.

jthomm commented 11 years ago

Hi ochawkeye.

I am not the creator of this software but I can most assuredly say that nflgame learns about week 1 of the NFL playoffs the same way it learns about week 10 of the NFL regular season, or week 3 of the preseason.

If you'd like to know how nflgame "knows" all this, I'd suggest browsing the source code or reading the docs BurntSushi has so graciously generated. :^)

Does anyone check this anymore?

Here's a thought. It sounds like you're deploying some of this open source code into production (or something important enough to have you waking up in a cold sweat). If whatever you're doing is fairly important, my humble, unsolicited suggestion would be to study the code and the documentation a little more carefully before airing frustrations here. BurntSushi clearly went out of his way to make the code clean, object-oriented, and "Pythonic" (and not everyone does) so why not take advantage of that.

If you have any specific questions, I'll try and help clear up what I can.

ochawkeye commented 11 years ago

Sincerest apologies if my question/post came off as frustrated. I am a huge fan of what BurntSushi has provided here and would be forever stuck in the world of manually entering data into an Excel spreadsheet if not for his most excellent work.

Does anyone check this anymore?

My question was by no means intended to insult anyone. In hindsight, I should have chosen my words more wisely, but it was my legitimate curiosity as to whether anyone was out there. A comment I posted last month did not draw any responses and while it didn't necessarily warrant response, nor would I ever presume to deserve a response, I was uncertain as to whether or not there was a community making use of this excellent tool.

I'm loving my experience with nflgame and it has truly enabled me to put some of my rudimentary Python knowledge to use. Unfortunately, that knowledge is probably only enough to make me dangerous.

Thank you so much for taking the time to respond. It makes me feel much better about my situation for this weekend. No where close to a 'production' application, but nflgame will be allowing for live scoring for my league of approximately 35. Only reason I'm losing sleep over it is because I'm too OCD for my own good and this has been a bit of a passion project for the past dozen years.

BurntSushi commented 11 years ago

ochawkeye has every right to be concerned, particularly since he's correct. Come Saturday, if nflgame/schedule.py isn't updated, nflgame will not be aware of post-season games.

I've updated the schedule to include the first week of the post-season. Unfortunately, nflgame doesn't have a way to discover schedule updates automatically. The problem is that it's hard to strike a balance (automatically) for how often we should inspect the schedule on nfl.com. My current solution is to just use manual updates, which works well for the regular season since the schedule for every week is known. But this doesn't work so well for the post-season.

Regardless, for the time being, updating the schedule is fairly simple. From the root of the nflgame directory, simply run:

scripts/create-schedule > nflgame/schedule.py

This will automatically look at the current schedules for 2009-2012, including pre/post-season, by downloading the XML from nfl.com. The information in nflgame/schedule.py is the knowledge that nflgame needs to connect search criteria (i.e., year, pre/reg/post, week, team, etc.) to JSON identifiers. Without it, your line of code

games = nflgame.games(2012, week=1, kind='POST')

won't work.

As of my most recent updates, fetching post-season games returns nothing since the JSON doesn't exist yet. It is my hope that, given my updates to the schedule, this will work come Saturday. I can't say for sure though, but I'll try to keep an eye on it.

As with updating the players database, I'm more apt to accept pull requests more quickly. But I will try to update this after each round of games this post-season.

I am skeptical of changing this approach to something more automatic in the future, precisely because I don't want to overdo it and hit nfl.com more than we need. It's easy to do this with JSON data, since we can know when the game is over (and thus, the data is fixed) and cache the JSON to disk for further queries. But for schedules, how do we know when to look for schedule updates? (IMO, this is a job for a cronjob.)

BurntSushi commented 11 years ago

Just to be clear, you should update your copy of nflgame to 1.1.7. Once it's updated, I expect it to work come Saturday.

BurntSushi commented 11 years ago

Also, are you using nflgame.live? I am quite sure that it will break in the post-season, as it doesn't look at post-season schedules. I'm not sure if I'll have time to fix that before the weekend.

ochawkeye commented 11 years ago

Thank you for the thorough explanation Andrew. I'll get everything updated in preparation for the big day.

I have not been using nflgame.live as I noticed your comment about the module being in alpha status specifically noting the potential pitfalls with the postseason (for that I am again very grateful for your documentation efforts). That in combination with my inexperience with pytz was enough to make me shy away.

During weeks 15, 16, and 17 I instead opted to use Windows Task Scheduler (gasp...Windows?! I know, I know...) to execute my scoring script at regular intervals, so it's not 100% real-time, but certainly close enough for what I want it for.

I've also simulated last years Wild Card games against what our league results were with great success, but there's no substitute for the real deal, right?

BurntSushi commented 11 years ago

During weeks 15, 16, and 17 I instead opted to use Windows Task Scheduler (gasp...Windows?! I know, I know...) to execute my scoring script at regular intervals, so it's not 100% real-time, but certainly close enough for what I want it for.

Hah. I'm actually pretty happy things are working on Windows. nflgame is the first time I've ever distributed something that works on Windows.

But yes, I think your approach is reasonable.

Throughout the season, I used nflgame.live to great success to enhance my Football Viewing Experience. Basically, I wrote a script that outputted play-by-play data of all active games, and highlighted plays involving one of my fantasy players. The code for that is in my fanfoot repo, but it's not nearly as polished as nflgame, and I made no considerations for platforms other than Linux.

I've also simulated last years Wild Card games against what our league results were with great success

Sounds very reasonable.

but there's no substitute for the real deal, right?

Unfortunately, no. I really do expect that things will work. A possible failure point is if NFL.com changes the format of their data in an unexpected way. But given that nflgame works with past post-seasons, I have no reason to believe that to be the case.

I really will try to check things out on Saturday at game time. If things go bad, come here and check out recent commits or post an issue.

(Technically, we could come up with substitutes by simulating games, but I'm not touching that idea with a ten foot pole. I guess you could also use JSON data from past post-seasons, but you'd have to connect it up right with the proper JSON ids.)

ochawkeye commented 11 years ago

but it's not nearly as polished as nflgame

I literally just about fell off of my chair. I've got a lot of learning to do to get a grasp of what it means to be Pythonic.

My "simple" script that generates the three html files I'm interested in (Player Stats, League Scores, League Matrix) and uploads them to ftp weighs in at nearly 900 lines.

BurntSushi commented 11 years ago

My "simple" script that generates the three html files I'm interested in (Player Stats, League Scores, League Matrix) and uploads them to ftp weighs in at nearly 900 lines.

That's not so bad, particularly since it looks like you have to keep track of teams and what not. If you upload them to github, I could try doing some refactoring for you.

But keep in mind that you only saw one little script. All of fanfoot is about 2800 lines. (But that includes Yahoo/ESPN scraping, inserting/updating/selecting data from a MySQL database, and some other goodies.)

jthomm commented 11 years ago

:+1: tip of the hat to ochawkeye.

You would think it would be fairly easy to discover schedule updates, particularly because we know that this year's divisional playoff games are going to be at http://www.nfl.com/ajax/scorestrip?season=2012&seasonType=POST&week=19. Next year's will be at http://www.nfl.com/ajax/scorestrip?season=2013&seasonType=POST&week=19 (unless nfl.com changes its API).

Now, if you visit either of these URLs, you will get a response but the XML object will be empty. However, if a user requests such a week via

nflgame.games(2012, week=19, kind='POST')

...why not allow nflgame to hit the corresponding URL and check for /ss/gms?

If the answer is that we don't want to wake a sleeping giant then perhaps you're right. The only other way I can think of would be to trigger a check upon the successful downloading of data for the week prior. In other words, when someone requests {'season': 2012, 'week': 17, 'kind': 'REG'} then, if nflgame finds any "completed" games, it can trigger a check for {'season': 2012, 'week': 18, 'kind': 'POST'}. But, I admit, that's kind of hacky and clearly won't work in many cases.

BurntSushi commented 11 years ago

However, if a user requests such a week via [...] why not allow nflgame to hit the corresponding URL and check for /ss/gms?

This is certainly possible. But what's the decision procedure for determining whether we should check the schedules?

Here's an idea: move the data in nflgame/schedule.py into a custom JSON file. I suppose that I could add some custom code that pings nfl.com only when the following conditions are met:

  1. A game in week N with type POST is requested.
  2. No other games in week N exist in the local schedule.

Then the local JSON schedule could be updated from nfl.com.

I think this might be OK, since it will only hit nfl.com when post season game data is requested.

But I don't think I'll have time to make that change this year. For now, we'll just have to use scripts/create-schedule > nflgame/schedule.py.

ochawkeye commented 11 years ago

Meant to comment earlier, but once things went off without a hitch I ran out and let the thing go on autopilot. KUDOS BurntSushi. My league of forty-seven is very grateful!

BurntSushi commented 11 years ago

I'm glad it worked!

I've updated the schedule for the divisional round. So you should be good to go for Saturday assuming you pull the latest changes.

Let me know if it's easier for you if I do a version bump on PyPI. Otherwise I'll leave it be.

I'll leave this issue open until the season is over, and I'll ping it whenever I update the schedule.

ochawkeye commented 11 years ago

Anyone else seeing some weird stuff tonight with the early game? Everything looks good for the late BAL/NE game, but the early game shows 8:43 remaining in Q4 and not all stats are accounted for for all players. I've got no sign of the JSON file in my gamecenter-json folder for that particular game either. Treachery afoot at nfl.com or just a glitch in the matrix?

BurntSushi commented 11 years ago

I can confirm that this is NFL.com shenanigans: raw JSON data. At time of writing, it says there is 8:43 remaining in the 4th quarter.

I've got no sign of the JSON file in my gamecenter-json folder for that particular game either.

Right. JSON data isn't cached to disk unless nflgame can detect that the game is over. Unfortunately, nflgame knows the game is over based on what quarter it is:

    def is_final(self):
        return self.qtr == 'Final' or self.qtr == 'final overtime'

And that data is faulty in this case.

I'd be willing to patch the JSON with proper values for the quarter/time remaining if it isn't fixed by NFL.com in the next few days.

ochawkeye commented 11 years ago

Thanks for checking Andrew. Unfortunately, it goes beyond just the time remaining. Player statistics are off as well. M.Ryan for example is listed with 317 passing yards (vs. 396 total for the game) whereas C.Kaepernick is listed with 233 passing yards (which is his correct total for the game). Fortunately for me in my script I can make manual adjustments to statistics easily enough.

It it is curious how the JSON file is being generated. The gamecenter feed on the nfl.com site must be utilizing another source. If they are in fact using another source, it wouldn't take much for them to allow this to fall into disrepair.

Rereading what I just wrote, it sounds like I'm in full tin-foil hat mode tonight :)

BurntSushi commented 11 years ago

Yeah, that sucks. Are you checking both game total stats and play level stats? (nflgame should do this for you if you're using max_player_stats.)

It's possible everything comes from one source, but that the GameCenter stuff is never "corrected."

But it's pure speculation. I suspect we'll never know how it all works.

BurntSushi commented 11 years ago

I've updated the schedule to include the Super Bowl.

:-( :-( :-(

BurntSushi commented 11 years ago

@ochawkeye - Check out the results of my latest test: https://github.com/BurntSushi/nflgame/tree/master/test-data/results-yahoo-2012-max

It compares the stats reported by Yahoo with the stats in nflgame. In general, it's much more accurate than I thought. But in my travels, I found a couple of games that were totally whacked out. (See commit logs.)

Closing this for now. More standardized and clean data is coming soon. :-) https://github.com/BurntSushi/nfldb