Closed BurntSushi closed 10 years ago
Hi... I have to say they work you have done is amazing. I've only written a few scripts with the library, but am interested in getting some of the data loaded into a backend, and if I can help document things in any way I'd like to volunteer.
Is it safe to say that I can learn the data model through the csv output you've added? For instance, I could run this:
def season_stats_search():
game = nflgame.one(2011, 17, "NE", "BUF")
game.players.csv('player-stats.csv')
To basically get the PlayerStats model:
name,
id,
home,
team,
pos,
defense_ast,
defense_ffum,
defense_int,
defense_sk,
defense_tkl,
fumbles_lost,
fumbles_rcv,
fumbles_tot,
fumbles_trcv,
fumbles_yds,
kicking_fga,
kicking_fgm,
kicking_fgyds,
kicking_totpfg,
kicking_xpa,
kicking_xpb,
kicking_xpmade,
kicking_xpmissed,
kicking_xptot,
kickret_avg,
kickret_lng,
kickret_lngtd,
kickret_ret,
kickret_tds,
passing_att,
passing_cmp,
passing_ints,
passing_tds,
passing_twopta,
passing_twoptm,
passing_yds,
punting_avg,
punting_i20,
punting_lng,
punting_pts,
punting_yds,
puntret_avg,
puntret_lng,
puntret_lngtd,
puntret_ret,
puntret_tds,
receiving_lng,
receiving_lngtd,
receiving_rec,
receiving_tds,
receiving_twopta,
receiving_twoptm,
receiving_yds,
rushing_att,
rushing_lng,
rushing_lngtd,
rushing_tds,
rushing_twopta,
rushing_twoptm,
rushing_yds
@chrislkeller Thanks for your kind offer!
To answer your question, no, the CSV output is not a very good way of learning the data model. The CSV output is a good way of discovering what sorts of data are available and the kinds of values they contain. But there is an even better way: look at the statmap.py data dictionary. It contains each statistical category and a short description of each.
I consider this a subset of the data model. The entire data model would include the following:
statmap.py
. It is easier to read that statmap.py
since it isn't written as a Python data structure.combine_max_stats
mixes game and play statistics. How?GameClock
, PossessionTime
and FieldPosition
.I think some of the relationships between essential types will become clear when I create an ER diagram for nfldb's database schema.
I fully expect that describing the data model is something I'll have to do unless you become intimately familiar with the source code.
With that said, if you're looking for lower-hanging fruit to contribute, then doing something for issue #13 would be absolutely fantastic. A new wiki page would be very appropriate. It would be great to be able to link people to examples of using play-by-play data! (Note that there are actually examples littered throughout the issue tracker, but there's no coherent organization to them.)
I just noticed that you said you were interested in getting the data "into a backend." My hope is that I'm trying to accomplish something like that with nfldb
using PostgreSQL. My primary goals are to have a simpler API than nflgame
and to make it fast. (nflgame will always be slow without a faster internal representation of data, which will probably never happen.) It should come with a program that auto-updates the database via nflgame
.
I really really want to have it done before the season starts so that I can use the footage I collected with nflvid to provide the Ultimate Scouting Interface and have more fun with fantasy football drafts. :-)
...so that I can use the footage I collected with nflvid to provide the Ultimate Scouting Interface
Yeah, my reaction precisely. But it's going to be tough to get other people in on it. I have just about every single play from the 2011 and 2012 season (all-22 coach footage) on my hard drive. It's a little over 400GB
. But I can't redistribute that for obvious legal reasons. (And even if it was legal, that would be the Torrent From Hell.) So other people will have to download their own copies, and it probably takes a few days for an entire season, depending on your connection.
Wow you have a lot of awesome projects in the works... nflvid
sounds just cool, though I'd assume one needs a subscription to access the video?
nfldb
sounds exactly like what I was looking for as far as a backend.
I agree with your suggestion that working on the wiki is likely more up my alley and allow me to learn much more about the API and what's available... Sounds like a fun project to work on while watching games this fall...
though I'd assume one needs a subscription to access the video?
@chrislkeller - Not at all. Using any capable video player (e.g., vlc
), you can watch the HTTP Live Streams for free:
vlc 'http://nlds82.cdnl3nl.neulion.com/nlds_vod/nfl/vod/2012/10/07/55577/2_55577_den_ne_2012_h_whole_1_4500.mp4.m3u8'
Getting the coach footage is a little more complicated since it's an rtmp
stream and vlc
doesn't seem to handle that as well. But that's what nflvid
is there for. :-)
I then use XML data to slice up the footage into plays: http://e2.cdnl3.neulion.com/nfl/edl/nflgr/2012/55577.xml
nfldb
's data model is described in its wiki. I'm inclined to think that that is enough. The data is fundamentally the same, except it is more structured in nfldb
.
The data model for the GameCenter JSON data desperately needs to be described. The fact that there are three different ways to access player statistics (game level, play level and combined) will be baffling to new users. This may be somewhat addressable in the API, but this needs to be explained in detail.
The data model should describe the relationship between Game, Drive, Play, Player and {Game,Play}PlayerStats objects. It should also describe how statistics are computed in play-by-play data using the
nflgame.statmap
module.