BurntSushi / nflgame

An API to retrieve and read NFL Game Center JSON data. It can work with real-time data, which can be used for fantasy football.
http://pdoc.burntsushi.net/nflgame
The Unlicense
1.27k stars 412 forks source link

Any help would be greatly appreciated. #23

Closed ghost closed 11 years ago

ghost commented 11 years ago

""Issues should be open for just about anything—including questions about how to use nflgame, or even problems with NFL.com data. There should be a pretty low threshold to opening one.""

Okay then, here goes. I’m clearly too old and not smart enough to understand Python or this little slice of heaven you call, nflgame. I haven’t been able to a damn thing on my own, if the code hasn’t been published here or in the tutorial, all I get are errors.

I'm using version ‘1.1.8’ of nflgame. My main focus is player stats, post game at rest data. I’ve used this code successfully, mostly because it’s in the tutorial.

nflgame.combine_max_stats(nflgame.games(2012)).csv('PD2012.csv',allfields=True)

The new allfields parameter, ‘thank you Andrew’ is very helpful, however, now I see inconsistent results, And I’m not sure why. ?? total players receiving_rec receiving_tar receiving_yac_yds receiving_yds ver. 1.1.7 1812 10838 17550 53939 125983 ver. 1.1.8 all fields=T 1812 10779 17429 53656 125259 ver. 1.1.8 1812 10838 17550 53939 125983 Additionally, most weeks yield 135 columns of data when all-fields is true, but other weeks only 134 columns. The “def_misc_yds” column appears to the culprit. This column misalignment makes summing values across multiple excel sheets a challenge. So how could I query just the data columns I want.?

nflgame.combine_max_stats(nflgame.games(2012,week=2)) .csv('PD2012-2.csv',fields= name, id, team, pos, receiving_rec, receiving_tar, receiving_yac_yds, receiving_yds ) --------for example…? Is something like this possible?

This is the only query I’ve been able to work on my own.

nflgame.teams However, it has no value to me just trying to get something working on my own. I can’t get .csv file dump to work. nflgame.teams.csv(“teams.csv”)

Additional things I’m tying without success are:

· Drive charts

import nflgame game = nflgame.one(2011, 17, "NE", "BUF") print game.drives o This works but only yields scoring drives per game o Need all drives by all teams, including starting field position per given week o Also in the form of a .csv dump

· Schedule o All teams for entire season, in the form of a .csv dump · Team rosters o All teams for any given week, in the form of a .csv dump · Injury report, player status o All teams for any given week, in the form of a .csv dump

Any help would be greatly appreciated. And thank you in advance.

BurntSushi commented 11 years ago

Here's the problem. The CSV dump of nflgame isn't supposed to be used like this. The entire API of nflgame is designed with the assumption that you'll be doing your analysis in Python. The CSV dump is there as an exit point if you need it, but it's not meant to be as flexible as you're wishing here.

I'll respond to the rest of your questions, but my most significant recommendation is to invest some more time into learning Python if you want to use nflgame effectively. If you know a little bit of Python and learn how to maneuver around nflgame's API, then dumping any kind of data to a CSV file is simple.

now I see inconsistent results, And I’m not sure why. ??

Because we're dealing with imperfect data, and it's impossible to reconcile correctly. In any given game, the variations are usually small or non-existent. But when you're summing everything up over a season, they become noticeable. As you've seen.

Where does the inconsistency come from? The source. NFL's GameCenter JSON actually contains two different sets of statistics: game level statistics and play-by-play statistics. Game level statistics are limited, but are sometimes more correct than play-by-play. Play-by-play statistics are exhaustive, but are sometimes more correct than game-level statistics. (Whacky, eh?)

When you use allfields=True, then every field from game and play statistics is combined. There is overlap between these two categories, and where there is overlap, there is inconsistency.

It is possible that there is a sane way to handle this, but I haven't thought of it yet. As a compromise, combine_max_stats was added, which tries to combine these two sources intelligently. But it isn't perfect.

Additionally, most weeks yield 135 columns of data when all-fields is true, but other weeks only 134 columns.

Could you please post the commands you're running that reproduce this?

This works but only yields scoring drives per game. Need all drives by all teams, including starting field position per given week. Also in the form of a .csv dump

I've run the commands you've posted, and I get all drives in the game. Not just scoring drives:

>>> import nflgame
>>> g = nflgame.one(2011, 17, "NE", "BUF")
>>> print g.drives
[BUF (Start: Q1 15:00, End: Q1 11:18) Touchdown, NE (Start: Q1 11:18, End: Q1 09:52) Punt, BUF (Start: Q1
09:52, End: Q1 05:19) Touchdown, NE (Start: Q1 05:19, End: Q1 03:55) Punt, BUF (Start: Q1 03:55, End: Q1 0
0:48) Touchdown, NE (Start: Q1 00:48, End: Q2 11:58) Touchdown, BUF (Start: Q2 11:58, End: Q2 08:59) Punt,
 NE (Start: Q2 08:59, End: Q2 05:42) Touchdown, BUF (Start: Q2 05:42, End: Q2 02:25) Downs, NE (Start: Q2
02:25, End: Q2 00:51) Interception, BUF (Start: Q2 00:51, End: Q2 00:00) Missed FG, NE (Start: Q3 15:00, E
nd: Q3 12:19) Field Goal, BUF (Start: Q3 12:19, End: Q3 08:38) Punt, NE (Start: Q3 08:38, End: Q3 05:13) F
ield Goal, BUF (Start: Q3 05:13, End: Q3 03:54) Interception, NE (Start: Q3 03:54, End: Q3 01:32) Touchdow
n, BUF (Start: Q3 01:32, End: Q4 12:04) Interception, NE (Start: Q4 12:04, End: Q4 11:16) Touchdown, BUF (
Start: Q4 11:16, End: Q4 10:23) Punt, NE (Start: Q4 10:23, End: Q4 03:02) Touchdown, BUF (Start: Q4 03:02,
 End: Q4 02:55) Interception, BUF (Start: Q4 02:55, End: Q4 01:30) Interception, NE (Start: Q4 01:30, End:
 Q4 00:00) End of Game]

nflgame doesn't dump drive data to a CSV for you. You'll have to do that yourself using Python's csv module.

teamfball commented 11 years ago

Could you please post the commands you're running that reproduce this?

nflgame.combine_max_stats(nflgame.games(2012,week=2)).csv('2012-2.csv',allfields=True) = 135 columns nflgame.combine_max_stats(nflgame.games(2012,week=3)).csv('2012-3.csv',allfields=True) = 134 colums

nflgame doesn't dump drive data to a CSV for you. You'll have to do that yourself using Python's csv module.

I didn't load the csv module for the above, or did I...?

I've run the commands you've posted, and I get all drives in the game. Not just scoring drives:

Correct you are, sorry. Could I add Field position to this..? And csv format...?

Invest time into Python.

But your young and intelligent, I’m old and ignorant, cut my teeth on the first apples and never had formal computer education beyond ‘Basic’ in 1976. My heart wants to do the investment first path but my head needs the reinforcement of results. I know, I get it. I help others with excel. Inspired by your open source approach, trickle down theory I recently started a work group. https://groups.google.com/forum/?hl=en&fromgroups=#!forum/teamfball_myline

Thanks for responding.

Okay, I just found a wonderful result based tutorial for Python. Just what the doctor ordered. I should be better vested when we chat again.

teamfball commented 11 years ago

So how could I query just the data columns I want.?

Okay, after some homework I have csv output very close to my goal.

import nflgame import csv week10 = nflgame.games(2012, 10) players = nflgame.combine_max_stats(week10) teamfball = [(p, p.playerid, p.team, p.player, p.rushing_att, p.rushing_yds, p.rushing_tds, p.receiving_rec, p. receiving_tar, p.receiving_tds, p.receiving_yac_yds, p.receiving_yds, p.passing_att, p.passing_cmp, p.passing_cmp_air_yrds, p.passing_int, p.passing_tds, p.passing_yds) for p in players] csv.writer(open('playerdatwk10.csv', 'w+')).writerows(teamfball)

Still missing some things like column headers and player.position. Also I need to find a way to determine if a missing player played but accumulated no stats or was off due to a bye week or simply didn’t play.

ochawkeye commented 11 years ago

Not sure if you got this figured out or gave up, but I took a crack at it anyway. I have to admit that I feel a bit sheepish posting this as I probably perpetuate terrible coding practices, but my non-pythonic brain is worried more about making stuff work than being PEP-approved.

import nflgame
import csv

week10 = nflgame.games(2012,10)
players = nflgame.combine_max_stats(week10)
teamfball = [(p, p.playerid, p.team, p.player, p.rushing_att, 
    p.rushing_yds, p.rushing_tds, p.receiving_rec, p.receiving_tar, 
    p.receiving_tds, p.receiving_yac_yds, p.receiving_yds, 
    p.passing_att, p.passing_cmp, p.passing_cmp_air_yrds, 
    p.passing_int, p.passing_tds, p.passing_yds) for p in players]
categories = 'p, p.playerid, p.team, p.player, p.rushing_att, \
    p.rushing_yds, p.rushing_tds, p.receiving_rec, p.receiving_tar, \
    p.receiving_tds, p.receiving_yac_yds, p.receiving_yds, \
    p.passing_att, p.passing_cmp, p.passing_cmp_air_yrds, \
    p.passing_int, p.passing_tds, p.passing_yds'
headers = categories.replace(" ","")
headers = headers.replace(" ", "")
headers = headers.split(',')
#csv.writer(open('playerdatwk10.csv', 'w+')).writerows(teamfball)
with open('playerdatwk10a.csv', 'w+') as f:
    csv.writer(f).writerow(headers)
    csv.writer(f).writerows(teamfball)

All I did was take you tuple in teamfball and split it into a list by replacing spaces and tabs and then splitting by the commas. Finally, I replaced your csv.writer(open... command as that tends to leave the file improperly open if something were to error.

teamfball commented 11 years ago

Thanks Hawkeye I was looking for a more efficient solution that could use the same turple for the headers thus avoiding header miss alignment. But your code will allow me to eliminate the ‘p dot’ prefix in the header. As to, “gave up” that won’t happen, I have no other option to acquire the data I need. And Pythonic, will remain an ambition for me. Although I’ve made good progress in recent days it has been by brut force rather than finesse.

Now that the weekly stats are achievable, player management has been a challenge for me in the past. For instance, journeyman player M. Spurlock (00-0023968) during the 2012 season played weeks 1-4 with SD, 5-12 with JAX, and back to SD weeks 13-17. The (combine_max_stats) post game query reflects those team moves.
Wk 10 data result ---- M.Spurlock 00-0023968 JAC Micheal Spurlock (WR, SD) However, how do I programmatically resolve where a player will be before the game takes place? Additionally, a player’s injury status is another hurdle.

BurntSushi commented 11 years ago

However, how do I programmatically resolve where a player will be before the game takes place? Additionally, a player’s injury status is another hurdle.

This is a difficult problem, and one that is not solved by nflgame.

If I needed to solve this problem (and I may yet do so when it comes time for fantasy football to start up again), then I'd probably look to scraping NFL.com once a day or something. This will give you updated roster data and injury status stuff. But it might also require you to save some state and compare that with updated data from NFL.com.

The beginnings of this are in scripts/download-player-data, which is what generates the huge players.json file. I don't think it'll be sufficient to completely solve your problem, but it might get you somewhere. Be warned though, web scraping is not for the faint of heart! It can break easily and without notice. And one must be judicious in its use, else NFL.com will likely ban you.

Finally, there may be better solutions than scraping NFL.com. I haven't really researched this problem extensively.

teamfball commented 11 years ago

Be warned though, web scraping is not for the faint of heart! It can break easily and without notice.

I understand all to well. That’s why I want to regulate location of my stat columns. Last year Yahoo added receiver targets mid season. Also I’ve been cross-referencing player injury status for over two years using inconsistent name matching. Fixing newfound problems nearly every week. Eg. Beanie Wells / Chris Wells / C.Wells. Anyone know if the NFL player id’s are consistent anywhere else on the web? I notice that paid data providers like https://nfldata.com/api/default.aspx , have multiple reference id’s but none the same as NFL.com. Prior to running out of time last season I was attempting to scrape the approximately 56 pages here. http://www.nfl.com/players/search?category=lastName&filter=B&playerType=current Could this possibly be a source for your json data. It appears to be all-inclusive with current status even the now time relevant free agent references.

BurntSushi commented 11 years ago

Also I’ve been cross-referencing player injury status for over two years using inconsistent name matching.

Yeah, the appropriate solution here is to measure edit distance between names. There are algorithms to do this. I plan on implementing one for next season. I bumped up against weird name collisions way too frequently.

Anyone know if the NFL player id’s are consistent anywhere else on the web?

Not a chance. This isn't even the case when things are all out in the open and nobody is trying to prevent you from mining data. (I do research in computational biology and we don't have anything close to resembling one unique identification system for things like genes. Yuck.)

Could this possibly be a source for your json data.

Possibly. It isn't now though.

I plan on taking a closer look at this player meta data problem after this semester is over and I have some extra cycles. I think it's a problem that can be solved elegantly with the right choice of algorithms and proper scraping techniques. It will probably be in a library separate from nflgame, although there would certainly be a way to connect data between them.

teamfball commented 11 years ago

Could this possibly be a source for your json data.

Possibly. It isn't now though.

May want to give that a serious look. Recent events like Harvin to SEA and Boldin to SFO are now current. Boldin is even listed without a jersey number. The Mike Wallace deal reported yesterday became official today and still listed with PIT , however. I’ll monitor these pages for a few days and advise.

BurntSushi commented 11 years ago

@teamfball Thank you :-)

teamfball commented 11 years ago

The M. Wallace move happened as predicted. I quickly got my player scraper working for those pages this morning. I decided to only scrape WR-TE and DL, or 12 of the 35 pages for obvious reasons. Of those 975 players, 166 are listed as UFA, unrestricted, and 45 listed as RFA restricted free agents as of today. I will track changes daily for a while.

teamfball commented 11 years ago

Still struggling with finding field position within game drives.

Ref; >>>Drive represents a single drive in an NFL game. It contains a list of all plays that happened in the drive, in chronological order. It also contains meta information about the drive such as the start and stop times and field position, length of possession, the number of first downs and a short descriptive string of the result of the drive.

import nflgame g = nflgame.one(2009, 4, "MIN", "GB") print g.drives

This works but no field position, length of possession, the number of first downs as described in the api.

[GB (Start: Q1 15:00, End: Q1 09:22) Fumble, MIN (Start: Q1 09:22, End: Q1 03:20) Touchdown, GB (Start: Q1 03:20, End: Q1 02:16) Touchdown, MIN (Start: Q1 02:16, End: Q1 00:57) Punt, GB (Start: Q1 00:57, End: Q2 11:42) Interception, MIN (Start: Q2 11:42, End: Q2 04:59) Touchdown, GB (Start: Q2 04:59, End: Q2 03:30) Punt, MIN (Start: Q2 03:30, End: Q2 03:18) Fumble, MIN (Start: Q2 03:18, End: Q2 00:30) Touchdown, GB (Start: Q2 00:30, End: Q2 00:00) End of Half, MIN (Start: Q3 15:00, End: Q3 10:40) Touchdown, GB (Start: Q3 10:40, End: Q3 02:12) Downs, MIN (Start: Q3 02:12, End: Q4 12:53) Punt, GB (Start: Q4 12:53, End: Q4 11:03) Punt, MIN (Start: Q4 11:03, End: Q4 08:49) Punt, GB (Start: Q4 08:49, End: Q4 07:21) Safety, MIN (Start: Q4 07:21, End: Q4 04:59) Punt, GB (Start: Q4 04:59, End: Q4 03:40) Touchdown, MIN (Start: Q4 03:40, End: Q4 03:10) Punt, GB (Start: Q4 03:10, End: Q4 00:55) Field Goal, MIN (Start: Q4 00:55, End: Q4 00:00) End of Game]

Packers Drive Chart StartTime TimePoss DriveBegan # ofPlays YardsGained Result 15:00 5:38 GB 26 9 41 Fumble 03:20 1:04 GB 33 3 67 Touchdown 00:57 4:15 GB 41 8 26 Interception 04:59 1:29 GB 25 3 -2 Punt 00:30 0:30 GB 23 5 29 End of Half 10:40 8:28 GB 18 14 81 Downs 12:53 1:50 GB 15 3 9 Punt 08:49 1:28 GB 1 3 -1 Safety 04:59 1:19 GB 4 6 96 Touchdown 03:10 2:15 GB 18 9 68 Field Goal Vikings Drive Chart StartTime TimePoss DriveBegan # ofPlays YardsGained Result 09:22 6:02 MIN 33 12 67 Touchdown 02:16 1:19 MIN 18 3 -4 Punt 11:42 6:43 MIN 23 10 77 Touchdown 03:30 0:12 MIN 44 1 -2 Fumble 03:18 2:48 MIN 16 6 84 Touchdown 15:00 4:20 MIN 20 8 80 Touchdown 02:12 4:19 MIN 1 11 41 Punt 11:03 2:14 MIN 49 3 9 Punt 07:21 2:22 MIN 40 3 6 Punt 03:40 0:30 GB 45 3 0 Punt 00:55 0:55 GB 39 2 -2 End of Game

Also wanting all drives for each team per given week. As in;

Dw4 = nflgame.games (2009, 4) print Dw4.drives

BurntSushi commented 11 years ago

You're just printing the string representation of each drive. You need to access attributes. Unfortunately, Python has no good way of documenting such things (that I'm aware of), so you'll need to look at the class itself. e.g., I see the following attributes: game, drive_num, team, home, first_downs, result, penalty_yds, total_yds, pos_time, play_cnt, field_start, time_start, field_end, time_end, plays.

e.g.,

for d in g.drives:
    print d.field_start, d.field_end
teamfball commented 11 years ago

Thanks, now I get it, this will keep me busy for a while.

On Wed, Apr 10, 2013 at 12:54 PM, Andrew Gallant notifications@github.comwrote:

You're just printing the string representation of each drive. You need to access attributes. Unfortunately, Python has no good way of documenting such things (that I'm aware of), so you'll need to look at the class itselfhttp://burntsushi.net/doc/nflgame/nflgame.game-pysrc.html#Drive.init. e.g., I see the following attributes: game, drive_num,team,home, first_downs,result,penalty_yds,total_yds,pos_time,play_cnt,field_start, time_start,field_end,time_end,plays`.

e.g.,

for d in g.drives: print d.field_start, d.field_end

— Reply to this email directly or view it on GitHubhttps://github.com/BurntSushi/nflgame/issues/23#issuecomment-16187014 .

teamfball commented 11 years ago

Andrew, thanks again for this awesome tool. Could someone provide some more insight, referring back to player data again.

I plan to construct a massive tuple ‘teamfball’ similar to the reference ochawkeye helped me with. However, the data set is still missing a few things in my eyes.

Opponent reference; To assemble many stats, net punt average for instance we need to know not only the punters stats, but also anyone that received a punt from the same punter. Can we add an opponent category? With both team and opponent the necessary associations can be made.

Seasonal vs. four and eight week data; back to one of my first questions.

Is it possible to specify the week or game number reference for player data.

Hmm, I don't understand this question. Maybe you could try rephrasing by making the expected input and the expected output a bit more clear?

In addition to seasonal data sums, I will be doing 4 and 8 week rolling sums.

So for instance, during week 8, team A could be playing its eighth game but opponent B may only be playing its seventh game in the eighth week of the season.

Although p.games already exists I think a week_no and a game_no will be required to sum programmatically across the weeks where the byes occur.

Also, a players position is present when using, nflgame.combine_max_stats(nflgame.games(2012)).csv('PD2012.csv',allfields=True)

However, its not available when using my tuple, teamfball

Ideally my tuple would begin like this;

for p in players:

 print p, p.playerid, p.team,  p.position, p.games, p.week_no, p.game_no, p.opponent,  etc. stats.
ochawkeye commented 11 years ago

While you have been using players from nflgame.combine_game_stats(games) there is another table called nflgame.players that I (or maybe all?) refer to as the player meta data. It includes height, weight, College, position, etc. That nflgame.players can be accessed by playerid. So if I have:

players = nflgame.combine_game_stats(games)
meta = nflgame.players
for p in players:
    id_number = p.playerid
    if str(id_number) in meta:
        position = meta[str(id_number)].position
BurntSushi commented 11 years ago

Can we add an opponent category?

It already exists. nflgame has a hierarchy: games contain drives and drives contain plays. Plays reference their drive and drives reference their game. Let's walk through an example by looking at a single punt play.

First, let's get the play. I've purposefully selected one that has an actual return:

g = nflgame.one(2011, 16, 'NE', 'NE')

# Inspect all of the punt plays in the game.
for p in g.drives.plays().filter(punting_tot__ge=1):
    print p

# The first two punts are downed or oob. So let's inspect the third.
punt = list(g.drives.plays().filter(punting_tot__ge=1))[2]

print punt
# Output: (NE, NE 21, 4 and 10) (3:20) Z.Mesko punts 55 yards to MIA 24, Center-D.Aiken. D.Bess to MIA 37 for 13 yards (R.Ninkovich).

(We need that list there because most things in nflgame return Python generators. In order to select a particular item in a generator, we need to coerce it into a list, which supports random access.)

Now I'm going to show you how to maneuver with punt, which is an instance of the Play class.

First, let's get the player that punted the ball and print the team he is on along with the length of the punt:

punter = list(punt.players.filter(punting_tot__ge=1))[0]
print str(punter), punter.team, punter.punting_yds
# Output: Z.Mesko NE 55

Second, let's get the guy who returned the punt, along with his team and the length of his return:

returner = list(punt.players.filter(puntret_tot__ge=1))[0]
print str(returner), returner.team, returner.puntret_yds
# Output: D.Bess MIA 13

We can go even further and look at down information and field positions, which are attributes of the play itself:

print punt.down, punt.yards_togo, punt.yardline.offset
# Output: 4 10 -29

Finally, we can look at the drive that ended with the punt:

print punt.drive
# Output: NE (Start: Q1 04:36, End: Q1 03:08) Punt

This kind of information cannot be added to the CSV exporter because the CSV exporter is stupid. It just takes sequences of labeled data and dumps it to the CSV format. It has no understanding of what an opponent is. Indeed, an "opponent" may not even make sense, since sequences of data need not correspond to a list of plays. It could be a list of statistics that we computed by combining plays---sometimes across entire games.

The idea here is that you do your analysis in Python. You can follow my steps above to play around, and when you're ready, you can traverse every game and perform your analysis. For example, to step over every punt play in the first week of the 2011 season and print the punter, returner and teams, you can do:

for g in nflgame.games_gen(2011, week=1):
    for p in g.drives.plays().filter(punting_tot__ge=1):
        # Not every punt has a returner!
        returner = list(p.players.filter(puntret_tot__ge=1))
        retinfo = 'No return'
        if len(returner) > 0:
            returner = returner[0]
            retinfo = '%s (%s) returned for %d yards.' \
                % (returner.name, returner.team, p.puntret_yds)

        punter = list(p.players.filter(punting_tot__ge=1))[0]
        print '%s (%s) punted %d yards. %s' \
            % (punter.name, punter.team, p.punting_yds, retinfo)

Also, a players position is present when using ... However, its not available when using my tuple, teamfball

Indeed. It doesn't make sense to have a position encoded with a statistical object, since a player may adopt multiple positions. A position is only an attribute of the player. So you'll need to access the player meta data to get the position:

print p, p.playerid, p.team, p.player.position

If you look at the code for the csv function, you can see that I did this explicitly as a special case.

Although p.games already exists I think a week_no and a game_no will be required to sum programmatically across the weeks where the byes occur.

This is actually very tricky. This is something that I might be willing to encode into the schedule of games (in nflgame/schedule.py), but it isn't there yet. Moreover, the game number isn't a property of a game, it's a property of a game and a team. So it's a bit awkward to represent that information in a Game object.

Until then, here's a working example that retrieves the week and game numbers:

import nflgame

def week_number(game):
    return nflgame.schedule.games_byid[game.eid]['week']

def game_number(team, weekno):
    info = nflgame.schedule.games_byid[g.eid]
    number = 0
    for (y, t, w, h, a), _ in nflgame.schedule.games:
        equal = [
            y == info['year'],
            t == info['season_type'],
            w <= weekno,
            team in (h, a),
        ]
        if not all(equal):
            continue
        number += 1
    return number

for g in nflgame.games_gen(2011, week=[7, 8, 9, 10]):
    weekno = week_number(g)
    home_gameno = game_number(g.home, weekno)
    away_gameno = game_number(g.away, weekno)

    print 'Week %d, %s (%s played %d games, %s played %d games)' \
        % (weekno, g, g.home, home_gameno, g.away, away_gameno)

In summary, these probably aren't the answers you want to hear—but you pose interesting questions. I do believe nflgame can accommodate you, but it might require a bit more work with Python.

I know you're more comfortable doing analysis in Excel, and that's OK. Given that information, my recommendation would be to come up with your own csv function that exports data in a way that is suitable to you. This way, you can incorporate precisely the data you want.

You can start with mine and modify it as needed. It's in nflgame/seq.py.

teamfball commented 11 years ago

Okay, I can access position through the hierarchy ……. p.player.position or by using ochawkeye’s method. Admittedly, player position was never important for reasons noted. But realizing that other data stores are possible I assume other avenues exist. I’m now beginning to think in terms of a relational database.

With this code……………….

>>> import nflgame
>>> def week_number(game):
    return nflgame.schedule.games_byid[game.eid]['week']

>>> def game_number(team, weekno):
    info = nflgame.schedule.games_byid[g.eid]
    number = 0
    for (y, t, w, h, a), _ in nflgame.schedule.games:
        equal = [
            y == info['year'],
            t == info['season_type'],
            w <= weekno,
            team in (h, a),
        ]
        if not all(equal):
            continue
        number += 1
    return number

>>> gmweek = nflgame.games(2012,10)
>>> players = nflgame.combine_max_stats(gmweek)
>>> for g in gmweek:
    weekno = week_number(g)
    home_gameno = game_number(g.home, weekno)
    away_gameno = game_number(g.away, weekno)

>>> for p in players:
    print weekno, g, g.home, home_gameno, g.away, away_gameno, p, p.team

results………. 10 PIT (16) vs. KC (13) PIT 9 KC 9 P.McAfee IND 10 PIT (16) vs. KC (13) PIT 9 KC 9 J.Blackmon JAC 10 PIT (16) vs. KC (13) PIT 9 KC 9 B.Gabbert JAC 10 PIT (16) vs. KC (13) PIT 9 KC 9 J.Hughes IND

I managed to get all the players but without a link from game to schedule.game each entry shows the same schedule.game reference. I’m thinking one line is required to link the two. Something like ------ nflgame.schedule.games_EID = nflgame.games_ID. However, I’m not finding exactly how the games are linked to the schedule_EID. 'eid': u'2012010108'

Assuming I can include a linked nflgame.schedule.games 'week': #, home and away teams in to my ‘teamfball_tuple’, I’m confident I will be able to calculate my four and eight week player values using excel. Thank you both for the recent input.

BurntSushi commented 11 years ago

@teamfball Whenever presenting code, I strongly urge you to explicitly state what it is you want it to do. Otherwise, I have to guess.

My guess is that you're trying to print every player along with week and game number for week 10 of the 2012 season. If so, here's how I'd do it:

gmweek = nflgame.games(2012, 10)
for g in gmweek:
    players = g.max_player_stats()
    weekno = week_number(g)
    for p in players:
        gameno = game_number(p.team, weekno)
        print weekno, gameno, g, p, p.team

Hope this helps :-)

teamfball commented 11 years ago

Good guess, it both helps and works as I had hoped.

I will try my ‘ brut-force ‘ approach to resolve what I believe to be my final hurdle or two. But don’t be surprised if I come back with an unambiguous question or two. Thanks.

teamfball commented 11 years ago

Teamfball is very happy indeed..! Thank you ochawkeye Thank you BurntSushi This will keep me occupied for a while.

BurntSushi commented 11 years ago

No problem! Glad I could help.

The semester is ending soon, so I hope to get back to playing with nflgame soon and get ready for next season :-)

teamfball commented 11 years ago

Well it appears my celebration was premature.

When using your code

for g in gmweek:
    players = g.max_player_stats()
    weekno = week_number(g)
    for p in players:
        gameno = game_number(p.team, weekno)
        print weekno, gameno, g, p, p.team

I get 912 correct results, from fourteen games, 2012, wk=10 How can I get the same 912 results in my csv file?

If I use;

for g in gmweek:
    players = g.max_player_stats()
    weekno = week_number(g)
    for p in players:
        gameno = game_number(p.team, weekno)
        teamfball = [(weekno, gameno, g, g.eid, p, p.playerid, p.team,    p.rushing_att,
        p.rushing_yds) for p in players]
        with open('t10a1.csv', 'wb') as f:
                csv.writer(f).writerows(teamfball)

I only get only the last game of that week, a Monday night’er PIT vs. KC with 63 correct player results from just that one game.

Additionally, with the unique game reference for each player I was planning to append these weekly results within the same excel sheet for easier analysis.

So could it be possible to retrieve every game and every player stat using the entire year as reference? That’s 256 games, est. 65 players per game, or approx 16640 results for a full season, in csv format.

Eg. gmyear = nflgame.games(2012) for g in gmyear:

ochawkeye commented 11 years ago

When your code says

for g in gmweek:
    players = g.max_player_stats()
    weekno = week_number(g)
    for p in players:
        gameno = game_number(p.team, weekno)
        teamfball = [(weekno, gameno, g, g.eid, p, p.playerid, p.team, p.rushing_att, p.rushing_yds) for p in players]
        with open('t10a1.csv', 'wb') as f:
            csv.writer(f).writerows(teamfball)

It is going through each game g one at a time and writing the results to t10a1.csv. Problem is that once it writes the contents for the first game to your file, it then goes to the next g in gmweek and writes the results to t10a1.csv. It is overwriting the data that you just wrote to that file a moment ago. Instead of opening the file with 'wb' you can try using 'ab' to append the new information to the existing. Or, you can first collect all the data that you want to capture before writing to the file.

FYI - instead of quoting your code, you can add four spaces (or a multiple of four spaces for further indents) to have it present in this forum better.

BurntSushi commented 11 years ago

@teamfball - @ochawkeye's advice is sound. Also, it seems you have indented your code properly, but instead of using the "```" to surround your code, you use ">", which is markdown for a quote. Could you please take a look at the modifications I made to your comment to see how to properly paste code? It will help readers. :-)

teamfball commented 11 years ago

That was very helpful, although I got nearly 60k rows of data. ie. number of players squared times 14 games, But I quickly realized I had a redundancy, “for p in players” Currently 912 results appear accurate, but I’ll double check later.

Could you expand on the comment; first collect all the data that you want to capture before writing to the file.

Thanks once again.

ochawkeye commented 11 years ago

But I quickly realized I had a redundancy, “for p in players”

Yep. Starting with a fresh file, looks like the corrected code is generating 912 lines.

Could you expand on the comment; first collect all the data that you want to capture before writing to the file.

Sure. The data you are assigning to teamfball and writing to your csv file is "forgotten" on each iteration through the loop. Instead of having teamfball be a list of data, you could make it be a list of lists.

In this example, I initialize teamfball as a list before the loop takes off and then add the new information to it each trip through the loop.

gmweek = nflgame.games(2012, 10)
teamfball = []
for g in gmweek:
    players = g.max_player_stats()
    weekno = week_number(g)
    for p in players:
        gameno = game_number(p.team, weekno)
        teamfball.append((weekno, gameno, g, g.eid, p, p.playerid, p.team, p.rushing_att, p.rushing_yds))
with open('t10a6.csv', 'wb') as f:
    csv.writer(f).writerows(teamfball)

You could take this same example and make gmweek = nflgame.games(2012) to get the full results from 2012. It'll take considerably longer to run than just getting the data from a single week, but within about 45 seconds (on my machine this is how long it took) you should have a file with 16.7k records.

teamfball commented 11 years ago

Shut the front door..! That’s what I’ve been hoping for.

I actually understood everything today except the post formatting issue. I tried everything to comply but the preview pane isn’t WYSIWYG, IMO. My post from 6 days ago appeared correctly, unless that was modified as well.

Thanks for the code ochawkeye.

>>> import nflgame
>>> import csv

I hope this is it? Or maybe, it was what it was all along…

>>> print ' it was what it was all along… '
             for p in players:
             print p.player, 'this is a test'
             open , "I think ochawkeye's code is wonderful in any color!  '
BurntSushi commented 11 years ago

My post from 6 days ago appeared correctly, unless that was modified as well.

Yes, I've been modifying your posts to make code look better :-) But I don't think @ochawkeye has the permission to do that. Try editing your last post with code and look at the stuff I put around your code. Before the code, I put "python" (three back-ticks followed by the word `python`) and after the code I put "" (three back-ticks).

ochawkeye commented 11 years ago

TIL I can do


import this
print 'Everything is so beautiful in color!'

instead of

import this
print "I'm the ugly code ochawkeye used to post"

I'll be danged. Thanks Andrew!

teamfball commented 11 years ago

FYI, here is my Excel formula to calculate QB passer rating. Although not Python, it may be useful for someone other than myself.

=(IF((((Cmp/Att)*100)-30)*0.05<0,0,IF((((Cmp/Att)*100)-30)*0.05>2.375,2.375,((((Cmp/Att)*100)-30)*0.05)))
+IF(((Yds/Att)-3)*0.25<0,0,IF(((Yds/Att)-3)*0.25>2.375,2.375,((Yds/Att)-3)*0.25))
+ IF(((TDs/Att)*100)*0.2<0,0,IF(((TDs/Att)*100)*0.2>2.375,2.375,((TDs/Att)*100)*0.2))
+ IF(((2.375-(((Int/Att)*100)*0.25)))<0,0,IF(((2.375-(((Int/Att)*100)*0.25)))>2.375,2.375,(2.375-(((Int/Att)*100)*0.25)))))*100/6

where; Cmp = passing_cmp Att = passing_att Yds = passing_yds TDs = passing_tds Int = passing_int

Note use this part of the formula when calculating individual stats.

+IF(((Yds/Att)-3)*0.25<0,0,IF(((Yds/Att)-3)*0.25>2.375,2.375,((Yds/Att)-3)*0.25))

When calculating team values as in OPR or DPR, note that the NFL subtracts sack yardage from team passing stats, therefore, you must add Sack-Yrds back into the equation.

+IF((((Yds+SackYrds)/Att)-3)*0.25<0,0,IF((((Yds+SackYrds)//Att)-3)*0.25>2.375,2.375,(( (Yds+SackYrds)//Att)-3)*0.25))

I may never be able give back as much as I have gotten from nflgame, but this is the best I can offer for now.

Thanks again for the help with python.

BurntSushi commented 11 years ago

@teamfball - Sorry about the late response, life got in the way.

Thanks very much for the Excel info. That looks absolutely mind boggling!

Anywho, I am closing this issue now. @teamfball, if you have any more questions, please open a new issue. :-)