BurntSushi / nflgame

An API to retrieve and read NFL Game Center JSON data. It can work with real-time data, which can be used for fantasy football.
http://pdoc.burntsushi.net/nflgame
The Unlicense
1.28k stars 413 forks source link

Team statistics #5

Closed cslester closed 12 years ago

cslester commented 12 years ago

This is a fantastic module; there is not much else out there to work with stats like this programmatically.

There are a couple of things that I have been stuck on, though...:

How would you go about finding defensive touchdowns and safeties? TDs seem like they should end up in the defensive player's stats (for the player that scored), but they aren't there. Safeties are mentioned in the statmap, but I don't see where they are tallied.

How about blocked kicks? I could possibly add up blocked kicks by all offensive players..., but is there a better way to do this, from the perspective of the defensive player?

BurntSushi commented 12 years ago

There is a way to do it, but you are right to be stuck. The functionality has been recently added, the API is still a little rough, and I haven't written any documentation/tutorials on how to handle your kind of situation. I will briefly elaborate here in lieu of full documentation that I don't have time to write at the moment (at a conference). Note that while what I'll describe probably won't change, there will probably be more intuitive ways to access this data in the near future.

There are two ways to access player data in nflgame, and they tend to overlap so it can be confusing—but this dichotomy reflects how the data is presented via the GameCenter JSON.

The first way is via game level statistics. These are stats reported by NFL.com for each player for an entire game. If the stat you want is available at the game level, you should prefer these. Retrieving game level statistics is already covered in the tutorial. Namely, given a game which is an instance of nflgame.game.Game, then game.players is a sequence containing all game level statistics for each player that played in the game.

The second way is via play level statistics. These are stats reported by NFL.com for each player on a play-by-play basis. They are a proper super set of game level statistics, and contain more detailed statistics: such as receiver targets, yards after the catch, defensive plays (including blocks, safeties, picks, fumbles, pick-6s, etc.) Namely, given a game which is an instance of nflgame.game.Game, then game.drives.players() or equivalently, games.drives.plays().players(), is a sequence containing combined play level statistics for each player that played in the game.

So let's say you have a game and your end goal is to find all defensive touchdowns. I'll take an incremental approach to try an motivate how nflgame's Play API works. First, let's start with something simpler: access all plays where a touchdown was scored:

>>> for p in game.drives.plays().filter(touchdown=True):
...    print p
... 
(2:05) T.Brady pass deep middle to A.Hernandez for 23 yards, TOUCHDOWN.
(11:16) (Shotgun) J.Locker sacked at TEN 1 for -6 yards (C.Jones). FUMBLES (C.Jones), RECOVERED by NE-D.Hightower at TEN 6. D.Hightower for 6 yards, TOUCHDOWN.
(2:06) T.Brady pass short right to R.Gronkowski for 2 yards, TOUCHDOWN. The Replay Assistant challenged the pass completion ruling, and the play was Upheld.
(10:25) (Shotgun) J.Locker pass deep right to N.Washington for 29 yards, TOUCHDOWN.
(1:08) S.Ridley left tackle for 1 yard, TOUCHDOWN.

You'll notice that the second play is the only defensive touchdown of the lot. You can see clearly the datum you want: D.Hightower's touchdown statistic. So, in incremental fashion, perhaps we could get all of the players involved in a scoring play?

>>> for p in game.drives.plays().filter(touchdown=True).players():
...    print p
... 
T.Brady
A.Hernandez
C.Jones
J.Locker
D.Hightower
R.Gronkowski
N.Washington
S.Ridley

You'll see here that the first touchdown involved Brady/Hernandez, and indeed, those are the first two players list. The scond play—which is also the defensive touchdown—actually involved three players: C.Jones sacked J.Locker and forced a fumble that D.Hightower recovered and returned for a touchdown.

So we're almost there, we have an idea of who's involved in each scoring play, but are really only interested in defensive players. Let's filter the players to include only the defense:

>>> for p in game.drives.plays().filter(touchdown=True).players().defense():
...    print p
... 
C.Jones
D.Hightower

But wait—we don't care about C.Jones! We only want the player who actually scored. So we filter out players based on who scored a touchdown:

>>> for p in game.drives.plays().filter(touchdown=True).players().touchdowns().defense():
...     print p, p.formatted_stats()
...     
... 
D.Hightower defense_frec_tds: 1, defense_frec_yds: 6, defense_frec: 1

And now we have a list of all defensive players that scored a touchdown in a game. A shorter way of achieving something similar is to forgo filtering the plays at all:

>>> for p in game.drives.plays().players().touchdowns().defense():
...     print p, p.formatted_stats()
...     
... 
D.Hightower defense_tkl: 4, defense_tkl_loss: 1, defense_tkl_loss_yds: 3, defense_frec_tds: 1, defense_frec_yds: 6, defense_frec: 1, defense_ast: 1

But this actually gives you a slightly different view of the data. This is D.Hightower's statistics from the entire game (derived from summing all statistics from each play he was involved in). Which may or may not be what you want.

In the short term future, I'd like to make it easier (via the API) to compute things like "How many touchdowns did New England's defense score in this game?"

Safeties are mentioned in the statmap, but I don't see where they are tallied. How about blocked kicks?

Statistics in the stat map are specifically for play-by-play data, and they are tallied implicitly when constructing an instance of the nflgame.game.Play class. We can alter the aforementioned examples—along with fields from the statmap—to answer questions about safeties and blocked kicks. Safeties first. To find all safety plays in a game:

>>> game = nflgame.one(2011, 3, "JAC", "JAC")
>>> for p in game.drives.plays().filter(defense_safe__ge=1):
...     print p
...     
... 
(7:39) (Shotgun) B.Gabbert sacked in End Zone for -7 yards, SAFETY (G.Hardy).

And to get the players that made safeties in a game:

>>> for p in game.drives.players().filter(defense_safe__ge=1):
...     print p
...     
... 
G.Hardy

And finally, to get all players that blocked a field goal:

>>> for p in game.drives.players().filter(defense_fgblk__ge=1):
...     print p
...     
... 
B.Orakpo

You may also use the defense_puntblk or defense_xpblk fields.

I hope this helps. Keep your nose to the grind for updates coming soon that will hopefully make viewing this data from different angles a bit easier.

BurntSushi commented 12 years ago

Another helpful hint that I used in my previous post: every player statistic object (an item in a sequence returned by a players() method) has a method called formatted_stats that will print a listing of all available statistics for that player. This might help you to debug when you're not sure which statistics you're looking at. I hope to publish a full list of all statistics that are available (and whether they are in game/play level stats, or both) soon. But for now, fields in the nflgame.statmap module should be good enough. Each (errm most) field also contains a description of what that stat represents. It's obvious for most, but not so obvious for some.

jthomm commented 12 years ago

Funny you should mention it, cslester. On Saturday New Orleans blocked a punt and returned it for a TD against Washington.

This is how I figured out that statIds 63 and 64 (Miscellaneous Yards, Miscellaneous Yards + TD) are used when someone does something funky like block a kick and take it to the house. Must have been too much "work" for the GSIS people to add categories for blocked kick/punt Yards/TDs. :P

Burnt, these ids are current commented out in 'statmap.py' so we may be off by one or two TDs this week.

BurntSushi commented 12 years ago

This is how I figured out that statIds 63 and 64 (Miscellaneous Yards, Miscellaneous Yards + TD) are used when someone does something funky like block a kick and take it to the house. Must have been too much "work" for the GSIS people to add categories for blocked kick/punt Yards/TDs. :P

Ah, excellent! I was hoping some stats would show in that category so I could see what they're used for.

Burnt, these ids are current commented out in 'statmap.py' so we may be off by one or two TDs this week.

Right. I think I'm just going to add something like "defense_misc_yds" and "defense_misc_tds". It's probably not worth trying to guess with any more precision—we might get something wrong.

I think the thing I was worried about initially was whether "miscellaneous yards" could come from anything other than a defense (which would make using the "defense" category bunk). But it seems like it's reserved for weird defensive yards.

jthomm commented 12 years ago

Right. I think I'm just going to add something like "defense_misc_yds" and "defense_misc_tds". It's probably not worth trying to guess with any more precision—we might get something wrong.

I don't want to be pedantic but I guess technically these are all "special teams" plays, rather than defense. But does NFL.com use a separate stat category for special teams?

The only reason I mention it is because one would think you could not count blocked punts/kicks, field goal returns, and the like and still get defensive totals correct.

BurntSushi commented 12 years ago

I don't want to be pedantic

Now is the time to be pedantic! If you have any other concerns about my naming, now is the time to raise them. It will be harder to change them later.

technically these are all "special teams" plays, rather than defense. But does NFL.com use a separate stat category for special teams?

Right. So NFL.com has the following stat categories: punting, puntret, kicking (including field goals) and kickret. But I've assigned all of the statistics to these categories myself (except for the ones assigned for me at the game level).

I would say, therefore, that if we were going to re-categorize things like "defense_puntblk" or "defense_fgblk", then we'd go for "puntret_blk" and "kickret_blk", respectively. But it feels a little hokey.

I'm not sure if we should add a new special teams category, since it might make the co-existence of "kicking" and "punting" categories a bit too confusing.

I suppose the only real argument I have in favor of keeping the things the way they are is that in fantasy football, field goal and punt blocks returned for touchdowns are counted toward the defense. (Right?)

The only reason I mention it is because one would think you could not count blocked punts/kicks, field goal returns, and the like and still get defensive totals correct.

Yes, but you'd need to correspond certain stats from the kickret and puntret categories to the appropriate defensive team. I'm not exactly against this either.

I don't feel compelled either way, so perhaps I should defer the decision. Is there any other source of NFL data that classifies these kinds of statistics? We can use whatever category they use.

jthomm commented 12 years ago

I suppose the only real argument I have in favor of keeping the things the way they are is that in fantasy football, field goal and punt blocks returned for touchdowns are counted toward the defense. (Right?)

That's right. Defense and Special Teams are always lumped together. Except when an offensive player returns a kick/punt for a TD (like Jeremy Kerley did against the Bills on Sunday). I believe the default Yahoo! scoring counts that towards the players overall points for the game.

Is there any other source of NFL data that classifies these kinds of statistics?

I did a quick check at ESPN and for the purposes of stats reporting they also classify this as "Defensive". (See: http://scores.espn.go.com/nfl/boxscore?gameId=320909018 under the table "New Orleans Defensive". Roby is credited with one defensive TD.)

I do think special teams is a distinct skill set. Special teams usually has its own coordinator and involves personnel from both offense and defense, as well as "specialists" who only play on kickoffs or field goals or punts. A team can be really strong on D but really bad at special teams (e.g. last year's Baltimore Ravens).

But at the same time, managing our own categories adds a potentially significant overhead and it's the sort of thing that can suck the fun out of developing a tool like this.

BurntSushi commented 12 years ago

Hmm. Well, I definitely don't like the idea of creating a new "special teams" category. What I was trying to figure out was whether to switch stats like "defense_puntblk" to "puntret_blk". This doesn't explicitly label them as special teams, but it at least separates them from the "defense" category and puts them in categories that are both subsets of a make-believe special teams category. To be clear, here are the fields and what I'd rename them to:

defense_puntblk -> puntret_blk

defense_xpblk -> kickret_blk

defense_fgblk -> kickret_blk

I like the first change, but the xpblk and fgblk going to the kickret category seems hokey.

And of course, this goes back to your initial point: what about miscellaneous yards? Those can occur on either punt or kicking blocks, which means we can't put them under 'kickret' or 'puntret'. Thus, either we use 'defense' or create a new category.

But at the same time, managing our own categories adds a potentially significant overhead and it's the sort of thing that can suck the fun out of developing a tool like this.

Agreed.

BurntSushi commented 12 years ago

I've decided to keep the blocking and miscellaneous stats in the defense category. It may not be technically correct, but I think it makes certain things simpler which is a win in its own way.