Closed BurntSushi closed 11 years ago
Interesting. The kicking_fgyds
field is a game stat and definitely misleading. You're right, it does appear to only give the long field goal.
Luckily, there is kicking_fgm_yds
when you use play-by-play data, though. If you use nflgame to collect players in a game, it will automatically sum the fields for you. For example:
>>> import nflgame
>>> g = nflgame.one(2012, 1, 'NE', 'NE')
>>> for p in g.drives.plays():
... if p.kicking_fgm_yds > 0:
... print p
...
...
...
(TEN, NE 10, Q1, 4 and 7) (8:47) (Field Goal formation) R.Bironas 28 yard field goal is GOOD, Center-B.Brinkley, Holder-B.Kern.
(TEN, NE 6, Q4, 4 and 6) (9:20) (Field Goal formation) R.Bironas 24 yard field goal is GOOD, Center-B.Brinkley, Holder-B.Kern.
(NE, TEN 7, Q4, 4 and 7) (4:19) (Field Goal formation) S.Gostkowski 25 yard field goal is GOOD, Center-D.Aiken, Holder-Z.Mesko.
(NE, TEN 13, Q4, 4 and 10) (:35) (Field Goal formation) S.Gostkowski 31 yard field goal is GOOD, Center-D.Aiken, Holder-Z.Mesko.
You can see here that Gostkowski has a total of 56 yards while Bironas has a total of 52 yards. Indeed, if we look at statistics of a player for a game, that is the case:
>>> import nflgame
>>> g = nflgame.one(2012, 1, 'NE', 'NE')
>>> for p in g.drives.plays().players().filter(kicking_fgm_yds__gt=0):
... print p.name, p.kicking_fgm_yds
...
S.Gostkowski 56
R.Bironas 52
We can simplify further using some convenience functions:
>>> import nflgame
>>> players = nflgame.combine_max_stats(nflgame.games(2012, 1, 'NE', 'NE'))
>>> for p in players.filter(kicking_fgm_yds__gt=0):
... print p, p.kicking_fgm_yds
...
S.Gostkowski 56
R.Bironas 52
Note the use of nflgame.combine_max_stats
. If you were just using nflgame.combine
, then kicking_fgm_yds
won't be available.
nflgame.combine_max_stats
That's exactly what I was looking for. Thanks!
Though not as extreme as original "one point per yard", my league does take interest in the length of field goals - names 0-39 yards, 40-49 yards, 50+, and misses of less than 40 yards.
(Forgive my code, it was from last year when I was a complete beginner. I now realize I could be doing this much more efficiently with a generator)
games = nflgame.games(year,week=week)
plays = nflgame.combine_plays(games)
short, medium, longest, missed = [], [], [], []
#Count number of times each player makes a field goal less than 40 yards
plays30 = plays.filter(kicking_fgm__ge=1, kicking_fgm_yds__lt=40)
for player in plays30.players().kicking():
short.append([str(player)+', '+player.team, player.kicking_fgm])
#Count number of times each player makes a field goal between 40 and 49 yards
plays40 = plays.filter(kicking_fgm__ge=1, kicking_fgm_yds__ge=40, kicking_fgm_yds__lt=50)
for player in plays40.players().kicking():
medium.append([str(player)+', '+player.team, player.kicking_fgm])
#Count number of times each player makes a field goal greater than 50 yards
plays50 = plays.filter(kicking_fgm__ge=1, kicking_fgm_yds__ge=50)
for player in plays50.players().kicking():
longest.append([str(player)+', '+player.team, player.kicking_fgm])
#Count number of times each player misses a field goal less than 40 yards
plays_miss = plays.filter(kicking_fgmissed_yds__lt=40)
for player in plays_miss.players().kicking():
if player.kicking_fgmissed>0:
missed.append([str(player)+', '+player.team, player.kicking_fgmissed])
If you were interested in the exact distance, you could be filtering the plays by that exact distance, no?
i = 0
while i < 70:
field_goals = plays.filter(kicking_fgm__ge=1,
kicking_fgm_yds__ge=i,
kicking_fgm_yds__lt=i+1)
i += 1
@ochawkeye - Your program has a bug! :P Truthfully though, this is a failing in nflgame's public API. Generators are used underneath everything to try and keep things memory efficient. The problem is that they are used inconsistently. As a result, in your program, only the short
list will be populated. Here's a small example of what I mean:
>>> import nflgame
>>> games = nflgame.games(2012, 1)
>>> plays = nflgame.combine_plays(games)
>>> len(list(plays))
2799
>>> len(list(plays))
0
But wait, what if we combine statistics instead of plays?
>>> plays = nflgame.combine_max_stats(games)
>>> len(list(plays))
1031
>>> len(list(plays))
1031
In the former case, combine_plays
returns a generator, which is exhausted after a single iteration. But in the latter case, you get an iterable which is "reset" each time you use it. The real reason why this is the case is that statistical objects are typically cumulative, so there won't be as many of them. (If you look at the source code, you can see that each one uses reduce
.) But asking for all Play
objects can quickly get into the tens of thousands of objects if it's over a couple weeks of football.
One fix to your program @ochawkeye is to regenerate the plays. For example:
plays30 = nflgame.combine_plays(games).filter(...)
...
plays40 = nflgame.combine_plays(games).filter(...)
...
plays50 = nflgame.combine_plays(games).filter(...)
But as you can imagine, that could be quite slow. So a work-around is to force the result of nflgame.combine_plays
into a list
. It will be stored in memory, but it will only have to be generated once. The problem is that a simple list
doesn't have all the nice filter
methods attached to it. So we need to re-wrap it with a nflgame.seq.GenPlays
. (Re-wrapping it has almost no overhead.)
Here's a complete working program that uses this strategy:
import nflgame
from nflgame.seq import GenPlays
games = nflgame.games(2012, 1)
plays = GenPlays(list(nflgame.combine_plays(games)))
short, medium, longest, missed = [], [], [], []
#Count number of times each player makes a field goal less than 40 yards
plays30 = plays.filter(kicking_fgm__ge=1, kicking_fgm_yds__lt=40)
for player in plays30.players().kicking():
short.append([str(player)+', '+player.team, player.kicking_fgm])
#Count number of times each player makes a field goal between 40 and 49 yards
plays40 = plays.filter(kicking_fgm__ge=1, kicking_fgm_yds__ge=40,
kicking_fgm_yds__lt=50)
for player in plays40.players().kicking():
medium.append([str(player)+', '+player.team, player.kicking_fgm])
#Count number of times each player makes a field goal greater than 50 yards
plays50 = plays.filter(kicking_fgm__ge=1, kicking_fgm_yds__ge=50)
for player in plays50.players().kicking():
longest.append([str(player)+', '+player.team, player.kicking_fgm])
#Count number of times each player misses a field goal less than 40 yards
plays_miss = plays.filter(kicking_fgmissed_yds__lt=40)
for player in plays_miss.players().kicking():
if player.kicking_fgmissed>0:
missed.append([str(player)+', '+player.team, player.kicking_fgmissed])
print 'Short', short
print 'Medium', medium
print 'Longest', longest
print 'Missed', missed
Here is something that is a little simpler. It keeps the looping logic in one place:
import nflgame
from nflgame.seq import GenPlays
games = nflgame.games(2012, 1)
plays = GenPlays(list(nflgame.combine_plays(games)))
def find_kickers(which_count, **kwargs):
players = []
for player in plays.filter(**kwargs).players():
count = getattr(player, which_count)
players.append([(player.name, player.team, count)])
return players
short = find_kickers('kicking_fgm', kicking_fgm__ge=1, kicking_fgm_yds__lt=40)
medium = find_kickers('kicking_fgm', kicking_fgm_yds__ge=40,
kicking_fgm_yds__lt=50)
longest = find_kickers('kicking_fgm', kicking_fgm_yds__ge=50)
missed = find_kickers('kicking_fgmissed', kicking_fgmissed__ge=1,
kicking_fgm_yds__lt=40)
print 'Short', short
print 'Medium', medium
print 'Longest', longest
print 'Missed', missed
Note that you don't need to use the kicking()
filter, since you've already filtered by kicking_
statistics.
Thanks, I was working on this as you posted. You probably saved me an afternoon of frustration.
Very nice but the results of the two versions aren’t the same. eg.
Missed [[u'A.Vinatieri, IND', 1]]
Vs.
Missed [[(u'A.Vinatieri', u'IND', 1)], [(u'S.Graham', u'HOU', 1)], [(u'A.Henery', u'PHI', 1)], [(u'R.Succop', u'KC', 1)], [(u'S.Hauschka', u'SEA', 1)], [(u'C.Campbell', u'ARI', 0)]]
Whoops. Replace this
missed = find_kickers('kicking_fgmissed', kicking_fgmissed__ge=1,
kicking_fgm_yds__lt=40)
with
missed = find_kickers('kicking_fgmissed', kicking_fgmissed__ge=1,
kicking_fgmissed_yds__lt=40)
Nice catch @teamfball!
Great stuff but my group consisting of neurotic nerds, scores both positive and negative FG’s like this. <40, +1 for good, or -3 if missed, 40-49, +2 or -2, 50plus +3 or -1..
I would love to define a variable called; kicking_fg_score: that sums the above scoring and could be added to my teamfball module. But it’s now crunch time. I’ll spin my wheels for the next few days unless you have an expert opinion how to do that.
Sample beginning of my stat module,
import nflgame
import csv
def week_number(game):
return nflgame.schedule.games_byid[game.eid]['week']
def game_number(team, weekno):
info = nflgame.schedule.games_byid[g.eid]
number = 0
for (y, t, w, h, a), _ in nflgame.schedule.games:
equal = [
y == info['year'],
t == info['season_type'],
w <= weekno,
team in (h, a),
]
if not all(equal):
continue
number += 1
return number
gmweek = nflgame.games(2012)
teamfball = []
for g in gmweek:
players = g.max_player_stats()
weekno = week_number(g)
home_gameno = game_number(g.home, weekno)
away_gameno = game_number(g.away, weekno)
for p in players:
gameno = game_number(p.team, weekno)
teamfball.append((weekno, g.home, home_gameno, g.eid, g.away, away_gameno, p, p.home, p.playerid, p.team, p.player, p.passing_att, p.passing_cmp, p.passing_cmp_air_yds, p.passing_incmp, p.passing_incmp_air_yds,
p.passing_int, p.passing_ints, p.passing_sk, p.passing_sk_yds, p.passing_tds, p.passing_twopta, p.passing_twoptm, p.passing_twoptmissed, p.passing_yds, p.receiving_lng, p.receiving_lngtd, p.receiving_rec,
p.receiving_tar, p.receiving_tds, p.receiving_twopta, p.receiving_twoptm, p.receiving_twoptmissed, p.receiving_yac_yds, p.receiving_yds, p.rushing_att, p.rushing_lng, p.rushing_lngtd, p.rushing_tds,
p.rushing_twopta, p.rushing_twoptm, p.rushing_twoptmissed, p.rushing_yds, p.punting_avg, p.punting_blk, p.punting_cnt, p.punting_i20, p.punting_lng, p.punting_pts, p.punting_tot, p.punting_touchback,
p.punting_yds, p.kicking_fga, p.kicking_fgb, p.kicking_fgm, p.kicking_fgm_yds, p.kicking_fgmissed, ------ kicking_fg_score ------
@teamfball - That is a bit tricky. In fact, this entire field goal business is tricky. The problem is that its calculation is very different than most other fantasy stats. There's no way to compute it given cumulative data, so we need to inspect it on a play-by-play basis.
My approach for your situation was to compute a dictionary of all field goal attempts for each game. The dictionary maps player id to a list of statistical objects describing the field goal. Then when you iterate through each player, we look to see if they are in that dictionary (i.e., they attempted at least one field goal in the game), and if they are, score the list of field goals for that game.
Here's the code adapted from yours:
from collections import defaultdict
import nflgame
def game_field_goals(game):
"""
Given a `nflgame.game.Game` object, return a dictionary mapping
player id to a list of field goal attempts. Each field goal
attempt is the corresponding `nflgame.player.PlayerStats` object.
"""
fg_attempts = defaultdict(list)
for play in game.drives.plays().filter(kicking_fga__ge=1):
for player in play.players.filter(kicking_fga__ge=1):
fg_attempts[player.playerid].append(player)
return fg_attempts
def fg_score(fg_attempts):
"""
Given a list of field goal attempts, compute @teamfball's whacky
fantasy scoring. :P
"""
score = 0
for att in fg_attempts:
if att.kicking_fgm >= 1: # Compute scores when the field goal is good.
if att.kicking_fgm_yds < 40:
score += 1
elif 40 <= att.kicking_fgm_yds < 50:
score += 2
else: # att.kicking_fgm_yds >= 50
score += 3
else: # Compute scores when the field goal is no good.
if att.kicking_fgmissed_yds < 40:
score -= 3
elif 40 <= att.kicking_fgmissed_yds < 50:
score -= 2
else: # att.kicking_fgmissed_yds >= 50
score -= 1
return score
def week_number(game):
return nflgame.schedule.games_byid[game.eid]['week']
def game_number(team, weekno):
info = nflgame.schedule.games_byid[g.eid]
number = 0
for (y, t, w, h, a), _ in nflgame.schedule.games:
equal = [
y == info['year'],
t == info['season_type'],
w <= weekno,
team in (h, a),
]
if not all(equal):
continue
number += 1
return number
gmweek = nflgame.games(2012, 2, 'NE', 'NE')
teamfball = []
for g in gmweek:
players = g.max_player_stats()
weekno = week_number(g)
home_gameno = game_number(g.home, weekno)
away_gameno = game_number(g.away, weekno)
game_fgs = game_field_goals(g)
for p in players:
if p.playerid in game_fgs:
kicking_fg_score = fg_score(game_fgs[p.playerid])
else:
kicking_fg_score = 0
# Some debugging code to inspect the raw stats if I've got something
# wrong. Remove this when you're confident things are correct.
if kicking_fg_score > 0:
print p.name, kicking_fg_score
print map(lambda o: o.formatted_stats(), game_fgs[p.playerid])
print '-' * 79
gameno = game_number(p.team, weekno)
teamfball.append(
(weekno, g.home, home_gameno, g.eid, g.away, away_gameno, p,
p.home, p.playerid, p.team, p.player, p.passing_att,
p.passing_cmp, p.passing_cmp_air_yds, p.passing_incmp,
p.passing_incmp_air_yds, p.passing_int, p.passing_ints,
p.passing_sk, p.passing_sk_yds, p.passing_tds,
p.passing_twopta, p.passing_twoptm, p.passing_twoptmissed,
p.passing_yds, p.receiving_lng, p.receiving_lngtd,
p.receiving_rec, p.receiving_tar, p.receiving_tds,
p.receiving_twopta, p.receiving_twoptm,
p.receiving_twoptmissed, p.receiving_yac_yds, p.receiving_yds,
p.rushing_att, p.rushing_lng, p.rushing_lngtd, p.rushing_tds,
p.rushing_twopta, p.rushing_twoptm, p.rushing_twoptmissed,
p.rushing_yds, p.punting_avg, p.punting_blk, p.punting_cnt,
p.punting_i20, p.punting_lng, p.punting_pts, p.punting_tot,
p.punting_touchback, p.punting_yds, p.kicking_fga,
p.kicking_fgb, p.kicking_fgm, p.kicking_fgm_yds,
p.kicking_fgmissed, kicking_fg_score))
Wow, never would have gotten there and probably died trying. Unfortunately testing will have to wait, I have some domestic duties this afternoon. Thanks so much.
Well it looks good but the results need fine tuning. Here's just a hand-full from the entire 2012 regular season vs. ESPN data. I will take time for a closer later.
...........vs. ESPN
B.Walsh 58 55
D.Akers 22 15
L.Tynes 35 30
R.Bironas 24 10
S.Suisham 39 42
Indeed. That is off. My tests say that there are no inaccuracies with regard to the number of field goals made and missed (as compared with Yahoo statistics). But the yardage could be off. R.Bironas
is quite surprising, but the rest could unfortunately be products of errors in the source data.
Another typo perhaps
elif 40 <= att.kicking_fgmissed_yds < 50:
should be
elif 40 >= att.kicking_fgmissed_yds < 50:
That's the same as saying, att.kicking_fgmissed_yds <= 40 and att.kicking_fgmissed_yds < 50
.
The way I have it written says that the missed field goal must be in the range of [40, 49) yards.
You caught me in a lie Andrew. I admitted to my writing that code last year
when I was a beginner. That part is true. But current me over thought
beginner me's code and removed what appeared to be redundant plays
assignments in my copy/paste. As soon as I submitted the comment I
remembered struggling through that last fall but figured the only person
who might catch me would be you :D Busted! I know nflgame
is your baby,
but I have to hand it to you - you know this thing inside and out.
As always, thanks for this awesome tool! On Aug 18, 2013 10:28 AM, "Andrew Gallant" notifications@github.com wrote:
@ochawkeye https://github.com/ochawkeye - Your program has a bug! :P Truthfully though, this is a failing in nflgame's public API. Generators are used underneath everything to try and keep things memory efficient. The problem is that they are used inconsistently. As a result, in your program, only the short list will be populated. Here's a small example of what I mean:
import nflgame>>> games = nflgame.games(2012, 1)>>> plays = nflgame.combine_plays(games)>>> len(list(plays))2799>>> len(list(plays))0
But wait, what if we combine statistics instead of plays?
plays = nflgame.combine_max_stats(games)>>> len(list(plays))1031>>> len(list(plays))1031
In the former case, combine_plays returns a generator, which is exhausted after a single iteration. But in the latter case, you get an iterable which is "reset" each time you use it. The real reason why this is the case is that statistical objects are typically cumulative, so there won't be as many of them. But asking for all Play objects can quickly get into the tens of thousands of objects if it's over a couple weeks of football.
One fix to your program ochawkeye is to regenerate the plays. For example:
plays30 = nflgame.combine_plays(games).filter(...)...plays40 = nflgame.combine_plays(games).filter(...)...plays50 = nflgame.combine_plays(games).filter(...)
But as you can imagine, that could be quite slow. So a work-around is to force the result of nflggame.combine_plays into a list. It will be stored in memory, but it will only have to be regenerated once. The problem is that a simple list doesn't have all the nice filter methods attached to it. So we need to re-wrap it with a nflgame.seq.GenPlays. (Re-wrapping it has almost no overhead.)
Here's a complete working program that uses this strategy:
import nflgamefrom nflgame.seq import GenPlays games = nflgame.games(2012, 1)plays = list(nflgame.combine_plays(games)) short, medium, longest, missed = [], [], [], []#Count number of times each player makes a field goal less than 40 yardsplays30 = GenPlays(plays).filter(kicking_fgmge=1, kicking_fgm_ydslt=40)for player in plays30.players().kicking(): short.append([str(player)+', '+player.team, player.kicking_fgm])#Count number of times each player makes a field goal between 40 and 49 yardsplays40 = GenPlays(plays).filter(kicking_fgmge=1, kicking_fgm_ydsge=40, kicking_fgm_ydslt=50)for player in plays40.players().kicking(): medium.append([str(player)+', '+player.team, player.kicking_fgm])#Count number of times each player makes a field goal greater than 50 yardsplays50 = GenPlays(plays).filter(kicking_fgmge=1, kicking_fgm_ydsge=50)for player in plays50.players().kicking(): longest.append([str(player)+', '+player.team, player.kicking_fgm])#Count number of times each player misses a field goal less than 40 yardsplays_miss = GenPlays(plays).filter(kicking_fgmissed_ydslt=40)for player in plays_miss.players().kicking(): if player.kicking_fgmissed>0: missed.append([str(player)+', '+player.team, player.kicking_fgmissed]) print 'Short', shortprint 'Medium', mediumprint 'Longest', longestprint 'Missed', missed
Here is something that is a little simpler. It keeps the looping logic in one place:
import nflgamefrom nflgame.seq import GenPlays games = nflgame.games(2012, 1)plays = list(nflgame.combine_plays(games)) def find_kickers(which_count, kwargs): players = [] ps = GenPlays(plays).filter(kwargs) for player in ps.players(): count = getattr(player, which_count) players.append([(player.name, player.team, count)]) return players short = find_kickers('kicking_fgm', kicking_fgmge=1, kicking_fgm_ydslt=40)medium = find_kickers('kicking_fgm', kicking_fgm_ydsge=40, kicking_fgm_ydslt=50)longest = find_kickers('kicking_fgm', kicking_fgm_yds__ge=50)missed = find_kickers('kicking_fgmissed', kicking_fgmissedge=1, kicking_fgm_ydslt=40) print 'Short', shortprint 'Medium', mediumprint 'Longest', longestprint 'Missed', missed
Note that you don't need to use the kicking() filter, since you've already filtered by kicking_ statistics.
— Reply to this email directly or view it on GitHubhttps://github.com/BurntSushi/nflgame/issues/26#issuecomment-22832356 .
@ochawkeye - Haha. :P
I've learned from my mistakes though. I'm trying to correct them in nfldb
. I'm hoping the API will be much more straight forward. Hopefully fewer moving parts... In nflgame
, there are too many ways to accomplish the same thing.
Good news Bad news
The good , every single kickers ‘Wacky’ kicking_fg_score
from Andrew’s code above matches EXACTLY with the data from ESPN.
Now the bad news, basic math skills depreciate when eating birthday cake while using a tablet pc at your niece’s party…! Shame on uncle teamfball.
nflgame Diff ESPN
B.Walsh 55 0 Blair Walsh 55
J.Hanson 45 0 Jason Hanson 45
S.Janikowski 45 0 Sebastian Janikowski 45
C.Barth 44 0 Connor Barth 44
P.Dawson 43 0 Phil Dawson 43
J.Tucker 42 0 Justin Tucker 42
D.Bailey 40 0 Dan Bailey 40
M.Bryant 39 0 Matt Bryant 39
S.Graham 37 0 Shayne Graham 37
S.Suisham 37 0 Shaun Suisham 37
J.Scobee 33 0 Josh Scobee 33
G.Zuerlein 32 0 Greg Zuerlein 32
J.Feely 30 0 Jay Feely 30
L.Tynes 30 0 Lawrence Tynes 30
A.Henery 29 0 Alex Henery 29
A.Vinatieri 29 0 Adam Vinatieri 29
S.Gostkowski 28 0 Stephen Gostkowski 28
S.Hauschka 28 0 Steven Hauschka 28
D.Carpenter 27 0 Dan Carpenter 27
K.Forbath 27 0 Kai Forbath 27
R.Succop 26 0 Ryan Succop 26
M.Prater 25 0 Matt Prater 25
N.Novak 24 0 Nick Novak 24
R.Lindell 23 0 Rian Lindell 23
R.Gould 22 0 Robbie Gould 22
R.Bironas 21 0 Rob Bironas 21
M.Nugent 20 0 Mike Nugent 20
N.Folk 18 0 Nick Folk 18
G.Hartley 17 0 Garrett Hartley 17
D.Akers 15 0 David Akers 15
J.Brown 15 0 Josh Brown 15
M.Crosby 15 0 Mason Crosby 15
G.Gano 10 0 Graham Gano 10
N.Kaeding 8 0 Nate Kaeding 8
J.Medlock 4 0 Justin Medlock 4
O.Mare 2 0 Olindo Mare 2
B.Cundiff 0 0 Billy Cundiff 0
Thank you very much for creating this package. It has allowed me to learn python by playing with numbers I am interested in. Is it possible to calculate 3rd down conversion percentage for each team in 1 game?
@poppers112 Could you please open a new issue? I don't want to answer different questions in the same issue, particularly one that is closed. Otherwise, the tracker becomes a disorganized mess. :-)
It would also be helpful to include code samples that you've tried already.
@poppers112 @ochawkeye graciously created an issue with some code for you in #36.
I received this question on my blog: