BurntSushi / nfldb

A library to manage and update NFL data in a relational database.
The Unlicense
1.08k stars 264 forks source link

newb having some trouble #33

Closed roodhouse closed 10 years ago

roodhouse commented 10 years ago

First, thank you for putting this together. I am excited about it.

However, as stated I am a newb and I've found myself lost.

I made it through the install instructions for Windows however each time a try to run either the top-ten-qbs.py or nfldb-update I get an error that says: "ImportError: No module named pytz"

Not sure what I have done wrong or missed. Could you help me troubleshoot.

Thank you! -John

BurntSushi commented 10 years ago

:P Yup, that sounds right.

You've uninstalled your previous pip, right?

roodhouse commented 10 years ago

yes i have. i got this:

c:\Python27\Scripts>python get-pip.py

python: can't open file 'get-pip.py': [Errno 2] No such file or directory

so i should do it like this instead: c:\Python27\Scripts>python < C:\Users\Rugh\Desktop\nfldbget-pip.py ?

roodhouse commented 10 years ago

looks like it did not save correctly..

c:\Python27\Scripts>python < c:\users\rugh\desktop\nfldb\get-pip.py File "", line 3 <!DOCTYPE html> ^

SyntaxError: invalid syntax

i am going to re-download it with the 2nd link you provided

roodhouse commented 10 years ago

c:\Python27\Scripts>python < c:\users\rugh\desktop\nfldb\get-pip.py Requirement already up-to-date: pip in c:\python27\lib\site-packages Cleaning up...

ochawkeye commented 10 years ago

Get rid of that <.

c:\python27\python.exe c:\users\rugh\desktop\nfldb\get-pip.py

roodhouse commented 10 years ago

this:

c:\Python27\Scripts>python c:\users\rugh\desktop\nfldb\get-pip.py Requirement already up-to-date: pip in c:\python27\lib\site-packages Cleaning up...

then this: c:\Python27>python pip.py install nfldb Requirement already up-to-date: pip in c:\python27\lib\site-packages Downloading/unpacking install Could not find any downloads that satisfy the requirement install Some externally hosted files were ignored (use --allow-external install to all ow). Cleaning up... No distributions at all found for install Storing debug log for failure in C:\Users\Rugh\pip\pip.log

BurntSushi commented 10 years ago

It looks like you didn't uninstall pip (or the pip uninstaller didn't do a very good job).

Go into C:/Python27/lib/site-packages and just delete the pip and nfldb directories. (The directories may be called pip-something-something, but delete them just the same. Then retry python C:/.../get-pip.py.

roodhouse commented 10 years ago

i am erasing folders called "pip" "nfldb" & "pip-1.5.6.dist-info" along with files called "nfldb-0.2.0py2.7.egg-info"

roodhouse commented 10 years ago

now this:

c:\Python27\Scripts>python c:\users\rugh\desktop\nfldb\get-pip.py Downloading/unpacking pip Installing collected packages: pip Successfully installed pip Cleaning up...

and then this:

c:\Python27>python pip.py install nfldb Requirement already up-to-date: pip in c:\python27\lib\site-packages Downloading/unpacking install Could not find any downloads that satisfy the requirement install Some externally hosted files were ignored (use --allow-external install to all ow). Cleaning up... No distributions at all found for install Storing debug log for failure in C:\Users\Rugh\pip\pip.log

ochawkeye commented 10 years ago

How about c:\python27\scripts\pip.exe install nfldb?

roodhouse commented 10 years ago

just tried this with this result:

c:\Python27\Scripts>python pip.exe install nfldb Traceback (most recent call last): File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "C:\Python27\lib\runpy.py", line 72, in _run_code exec code in run_globals File "pip.exemain.py", line 9, in File "C:\Python27\pip.py", line 17445, in main bootstrap(tmpdir=tmpdir) File "C:\Python27\pip.py", line 17415, in bootstrap cert.write(pkgutil.get_data("pip._vendor.requests", "cacert.pem")) File "C:\Python27\lib\pkgutil.py", line 578, in get_data loader = get_loader(package) File "C:\Python27\lib\pkgutil.py", line 464, in get_loader return find_loader(fullname) File "C:\Python27\lib\pkgutil.py", line 474, in find_loader for importer in iter_importers(fullname): File "C:\Python27\lib\pkgutil.py", line 430, in iter_importers import(pkg) ImportError: No module named _vendor

and what you suggested with this result:

c:\Python27\Scripts>python pip.exe install nfldb Traceback (most recent call last): File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "C:\Python27\lib\runpy.py", line 72, in _run_code exec code in run_globals File "pip.exemain.py", line 9, in File "C:\Python27\pip.py", line 17445, in main bootstrap(tmpdir=tmpdir) File "C:\Python27\pip.py", line 17415, in bootstrap cert.write(pkgutil.get_data("pip._vendor.requests", "cacert.pem")) File "C:\Python27\lib\pkgutil.py", line 578, in get_data loader = get_loader(package) File "C:\Python27\lib\pkgutil.py", line 464, in get_loader return find_loader(fullname) File "C:\Python27\lib\pkgutil.py", line 474, in find_loader for importer in iter_importers(fullname): File "C:\Python27\lib\pkgutil.py", line 430, in iter_importers import(pkg) ImportError: No module named _vendor

roodhouse commented 10 years ago

however, this time my _vendor folder is in fact missing...

image

roodhouse commented 10 years ago

wait.. no it is not... it is w/in pip.. sorry

ochawkeye commented 10 years ago

I'm available for a TeamViewer session if you have time.

roodhouse commented 10 years ago

how long will you be available for? i have to run out real quick..

roodhouse commented 10 years ago

nevermind, lets do it

roodhouse commented 10 years ago

email sent..

roodhouse commented 10 years ago

awesome.

@ochawkeye thank you! @BurntSushi thank you!

BurntSushi commented 10 years ago

@roodhouse Got it? Yay!

@ochawkeye I am fiercely curious what the issue was...

ochawkeye commented 10 years ago

@BurntSushi T'was a rouge file named 'pip.py' in the Python execution folder. I guess that file was being given preferential treatment over the actual pip module and was overriding the commands that were trying to be sent to pip. Similar to if I* were to create an 'nflgame.py' file and paste it into my working directory, Python would try to use that instead of the actual nflgame module.

*not that I ever did that when I first started playing with nflgame and it took me two days to figure out what the heck was wrong with your "junk" python code

BurntSushi commented 10 years ago

@ochawkeye Ah ha! Nice find.

Yeah, 99% of all Python problems are due to its insane import resolution semantics. Gah.

One reason to move to Python 3 is that pip is now included in every Python 3.4 distribution.

ochawkeye commented 10 years ago

I only hope the frustration of the past couple of days hasn't scared away @roodhouse! nflgame|db|vid|fan, and Python in general, can be a very rewarding hobby.

iliketowel commented 10 years ago

First of all, thanks so much for putting this together, this is some amazing stuff, and I've wanted to work with data like this for a while now.

Hey, I'm reading this now, and I'm basically in the same predicament. I have gotten the nfldb db loaded in postgresql (for ease, I'll call it SQLnfldb), but may be issues with the python nfldb (PYTnfldb). I got the get-pip.py file downloaded and I ran it. I thought it installed okay, but when I tried to run the test example I get an empty space

import nfldb db = nfldb.connect() q = nfldb.Query(db) q.game(season_year=2013, season_type='regular') <nfldb.query.Query object at 0x03562A70>

I tried pressing through this (I wasn't sure if this was an error or not), and when I ran

for pp in q.sort('passing_yds').limit(10).as_aggregate(): print pp.player, pp.passing_yds

It seems to work okay for 2012 one time, but since then, I get nothing. I just wanted to confirm if I should be seeing the "Query object at 0x0#######" or if that's a sign that something is wrong, or if there's another issue in play.

Thanks again.

ochawkeye commented 10 years ago

Strings in Python are case sensitive, so you need to watch out for that season_type='Regular'

Running this code:

import nfldb
db = nfldb.connect()
q = nfldb.Query(db)
q.game(season_year=2013, season_type='Regular')

for pp in q.sort('passing_yds').limit(10).as_aggregate():
    print pp.player, pp.passing_yds

should give you this result:

Peyton Manning (DEN, QB) 5477
Drew Brees (NO, QB) 5139
Matthew Stafford (DET, QB) 4647
Matt Ryan (ATL, QB) 4515
Philip Rivers (SD, QB) 4478
Tom Brady (NE, QB) 4338
Andy Dalton (CIN, QB) 4296
Carson Palmer (ARI, QB) 4274
Ben Roethlisberger (PIT, QB) 4147
Joe Flacco (BAL, QB) 3912

Changing that one character in my code:

import nfldb
db = nfldb.connect()
q = nfldb.Query(db)
q.game(season_year=2013, season_type='regular')

for pp in q.sort('passing_yds').limit(10).as_aggregate():
    print pp.player, pp.passing_yds

now gives me a Python error:

Traceback (most recent call last):
  File "P:\Projects\Home Computer\Fantasy Football\2013\scratch7.py", line 6, in <module>
    for pp in q.sort('passing_yds').limit(10).as_aggregate():
  File "C:\Python27\lib\site-packages\nfldb\query.py", line 925, in as_aggregate
    cur.execute(q)
  File "C:\Python27\lib\site-packages\psycopg2\extras.py", line 223, in execute
    return super(RealDictCursor, self).execute(query, vars)
psycopg2.DataError: invalid input value for enum season_phase: "regular"
LINE 8:                 WHERE (((game.season_type = 'regular') AND (...

Is this what you're seeing?

iliketowel commented 10 years ago

No... It's going through. It just seems to be... empty. I just restarted IDLE

I did this:

import nfldb db = nfldb.connect() q = nfldb.Query(db) q.game(season_year=2013, season_type='Regular')

And got this message: <nfldb.query.Query object at 0x03518030>

I then ran

for pp in q.sort('passing_yds').limit(10).as_aggregate(): print pp.player, pp.passing_yds

and got the same results as you Peyton Manning (DEN, QB) 5477 Drew Brees (NO, QB) 5139 Matthew Stafford (DET, QB) 4647 Matt Ryan (ATL, QB) 4515 Philip Rivers (SD, QB) 4478 Tom Brady (NE, QB) 4338 Andy Dalton (CIN, QB) 4296 Carson Palmer (ARI, QB) 4274 Ben Roethlisberger (PIT, QB) 4147 Joe Flacco (BAL, QB) 3912

But now, when I ran

q.game(season_year=2012, season_type='Regular') for pp in q.sort('passing_yds').limit(10).as_aggregate(): print pp.player, pp.passing_yds

underneath, I got a blank (usually I had been hitting enter and then seeing results), and then (>>>) with program waiting for me to make a command.

iliketowel commented 10 years ago

I guess I haven't asked what I suppose should be an obvious question, do I need to re-enter

import nfldb db = nfldb.connect() q = nfldb.Query(db) q.game(season_year=2013, season_type='Regular')

every time?

And again, thanks to yourself and burntsushi, this is super fun stuff.

ochawkeye commented 10 years ago

Ahhh...I'm beginning to understand. The IDLE shell has it's uses, but it is not much more than a learning tool for Python.

Try clicking File->New File from the menu. Paste all of that code you posted above into the Untitled document window that opens and save the file with an appropriate name - maybe something like top-ten-qbs.py. Now, in the same window where you wrote your code, click Run->Run Module (or simply click F5). Control will switch back to the shell window that was open in the background and all of your Python code will execute instead of just line by line as you were typing it into the shell.

But you don't really need IDLE to do any of this. With that same file you just created (which you could have created with the text editor of your preference), you can fire up a command prompt and enter the following:

Microsoft Windows [Version 6.3.9600]
(c) 2013 Microsoft Corporation. All rights reserved.

C:\Users\OCHawkeye> python c:\path\to\my\file\top-ten-qbs.py
Drew Brees (NO, QB) 5177
Matthew Stafford (DET, QB) 4965
Tony Romo (DAL, QB) 4903
Tom Brady (NE, QB) 4799
Matt Ryan (ATL, QB) 4719
Peyton Manning (DEN, QB) 4667
Andrew Luck (IND, QB) 4374
Aaron Rodgers (GB, QB) 4303
Josh Freeman (UNK, UNK) 4065
Carson Palmer (ARI, QB) 4018

C:\Users\OCHawkeye>
ochawkeye commented 10 years ago

do I need to re-enter ... every time?

I think you just asked a question asked by every beginner programmer. The answer, of course, is a resounding NO! That's probably why you're doing this in the first place. Sure, you could look up each of the yardage totals for each of those QBs on NFL.com and create the table yourself - or you could have the Python code do the work for you.

If you find yourself typing redundant code over and over again, there is a good chance you could be consolidating that in to a much more succinct set of instructions.

Say you wanted to find out who led the league in passing in 2013, 2012, and 2011. You could always go the route of copying and pasting your working example above, changing the season_year value each time.

import nfldb
db = nfldb.connect()

q = nfldb.Query(db)
q.game(season_year=2013, season_type='Regular')
for pp in q.sort('passing_yds').limit(10).as_aggregate():
    print pp.player, pp.passing_yds
print '-'*79
q = nfldb.Query(db)
q.game(season_year=2012, season_type='Regular')
for pp in q.sort('passing_yds').limit(10).as_aggregate():
    print pp.player, pp.passing_yds
print '-'*79
q = nfldb.Query(db)
q.game(season_year=2011, season_type='Regular')
for pp in q.sort('passing_yds').limit(10).as_aggregate():
    print pp.player, pp.passing_yds

There sure is a lot of code that is written and rewritten again and again there. It could easy be re-factored into the following:

import nfldb
db = nfldb.connect()

def top_10_qb_passing_yds(db, yr):
    q = nfldb.Query(db)
    q.game(season_year=yr, season_type='Regular')
    for pp in q.sort('passing_yds').limit(10).as_aggregate():
        print pp.player, pp.passing_yds

for year in [2013, 2012, 2011]:
    top_10_qb_passing_yds(db, year)
    print '-'*79

I see now that there is another way to interpret your question. The answer to that version is:

Do I need to re-enter <> every time?: import nfldb - nope, you only need to import 1 time db = nfldb.connect() - nope, you are only establishing a single connection to the database q = nfldb.Query(db) - yep, if you want to run a new query, you have to re-enter this q.game(season_year=2013, season_type='Regular') - yep, if you ran a new query above, this is what you would use to filter the query of the entire database down to a single year and season-type.

ochawkeye commented 10 years ago

Do I need to re-enter <> every time?: (continued)

On the other hand, if you're doing more/different stuff with the same query, then no, you don't need to re-enter the top lines every time.

For example, if I'm doing multiple sorts of the same data pulled from the database, then I only need pull the data from the database the one time. The following is perfectly valid code.

import nfldb
db = nfldb.connect()

q = nfldb.Query(db)
q.game(season_year=2013, season_type='Regular')
for pp in q.sort('passing_yds').limit(3).as_aggregate():
    print pp.player, pp.passing_yds
print '-'*79
for pp in q.sort('rushing_yds').limit(3).as_aggregate():
    print pp.player, pp.rushing_yds
print '-'*79
for pp in q.sort('receiving_yds').limit(3).as_aggregate():
    print pp.player, pp.receiving_yds
Peyton Manning (DEN, QB) 5477
Drew Brees (NO, QB) 5139
Matthew Stafford (DET, QB) 4647
-------------------------------------------------------------------------------
LeSean McCoy (PHI, RB) 1607
Matt Forte (CHI, RB) 1341
Jamaal Charles (KC, RB) 1288
-------------------------------------------------------------------------------
Josh Gordon (CLE, WR) 1646
Calvin Johnson (DET, WR) 1489
Antonio Brown (PIT, WR) 1438
iliketowel commented 10 years ago

I tried to run the update module, and it looks like it worked somewhat, but I got some errors too.

python_nfl_v2

I get an error saying "no module named httplib2" and a separate error saying python.exe -m nflgame.update_players --no-block' failed (exit status 1)

otherwise, it seemed to load okay.

iliketowel commented 10 years ago

for year in [2013, 2012, 2011]: top_10_qb_passing_yds(db, year) print '-'*79

I was about to ask was the 'print '-'*79 was, then I ran it and realized it was a separator.

I have another question, which may be something that's a totally different "issue", or an answer or already exists

When I ran that top 10 QBs for 2012 I got this result in the 10

Josh Freeman (UNK, UNK) 4065

Is there a simple method to refer to the team he played for during that season (TB), rather than UNK, (which is his current situation)?

Thanks again.

ochawkeye commented 10 years ago

httplib2 is a dependency of nfldb and should have been installed if you used pip to do the nfldb installation.

What does it say if you try to pip install nfldb now? Should look like the following:

C:\Users\Ben>pip install nfldb
Requirement already satisfied (use --upgrade to upgrade): nfldb in d:\python27\l
ib\site-packages
Requirement already satisfied (use --upgrade to upgrade): nflgame>=1.2.2 in d:\p
ython27\lib\site-packages (from nfldb)
Requirement already satisfied (use --upgrade to upgrade): psycopg2 in d:\python2
7\lib\site-packages (from nfldb)
Requirement already satisfied (use --upgrade to upgrade): enum34 in d:\python27\
lib\site-packages (from nfldb)
Requirement already satisfied (use --upgrade to upgrade): pytz in d:\python27\li
b\site-packages (from nfldb)
Requirement already satisfied (use --upgrade to upgrade): httplib2 in d:\python2
7\lib\site-packages (from nflgame>=1.2.2->nfldb)
Requirement already satisfied (use --upgrade to upgrade): beautifulsoup4 in d:\p
ython27\lib\site-packages (from nflgame>=1.2.2->nfldb)
Cleaning up...

Notice the line

Requirement already satisfied (use --upgrade to upgrade): httplib2 in d:\python2
7\lib\site-packages (from nflgame>=1.2.2->nfldb)
iliketowel commented 10 years ago

Huh... now I get an error message 'pip' is not a recognized as an internal or external command.

last time, I installed by doing python pip.exe install nfldb (or maybe pip.exe install nfldb) but from the directory of c:\python27\Scripts

I tried doing it from that directory and it worked, the first 5 were the same as yours, but I didn't have the httplib or beautifulsoup lines.

ochawkeye commented 10 years ago

Is there a simple method to refer to the team he played for during that season (TB), rather than UNK, (which is his current situation)?

Interesting question and one I'm not immediately able to provide an answer for. Of course, it can be a complex scenario with what you call a player that ends up hopping from team to team to team, but truth is that nfldb "knows" he played for Tampa Bay that year even though that fact is not explicitly tied to Josh Freeman's meta data anywhere.

import nfldb
db = nfldb.connect()

q = nfldb.Query(db)
q.game(season_year=2012, season_type='Regular')
q.play_player(team='TB')

for pp in q.sort('passing_yds').limit(3).as_aggregate():
    print '%s - %s yards passing' % (pp.player, pp.passing_yds)
for pp in q.sort('rushing_yds').limit(3).as_aggregate():
    print '%s - %s yards rushing' % (pp.player, pp.rushing_yds)
for pp in q.sort('receiving_yds').limit(3).as_aggregate():
    print '%s - %s yards receiving' % (pp.player, pp.receiving_yds)
Josh Freeman (UNK, UNK) - 4065 yards passing
Dan Orlovsky (DET, QB) - 51 yards passing
Mike Williams (BUF, WR) - 28 yards passing
Doug Martin (TB, RB) - 1454 yards rushing
LeGarrette Blount (PIT, RB) - 151 yards rushing
Josh Freeman (UNK, UNK) - 135 yards rushing
Vincent Jackson (TB, WR) - 1384 yards receiving
Mike Williams (BUF, WR) - 996 yards receiving
Doug Martin (TB, RB) - 472 yards receiving

I'm sure @burntsushi can give a thorough explanation.

ochawkeye commented 10 years ago

Re: 'pip' is not a recognized as an internal or external command

Just need to add C:\Python27\Scripts to your system's PATH ENVIRONMENT VARIABLE see #21

BurntSushi commented 10 years ago

@iliketowel With regards to the team for Josh Freeman. This gets a bit hairy.

Here is a central truth about the data in nfldb: its meta data about a player is always current. The data is meant to capture information about the player as he exists this moment. This means that only an active roster spot will give a player a team. This meta data is what you get when you use pp.player. Namely, it retrieves player meta for the player statistic pp.

With that said, every individual statistic for a player also has a team attached to it. This is historical data, so that the proper team for every player stays fixed.

So that means, if you're listing individual play statistics, you can always print the right team. For example:

import nfldb

db = nfldb.connect()
q = nfldb.Query(db)

q.game(season_year='2012', season_type='Regular', week=1)
q.player(full_name='Josh Freeman')
q.play_player(passing_yds__ge=15)

for pp in q.as_play_players():
    print pp.player.full_name, pp.team, pp.passing_yds

And the output:

[andrew@Liger nfldb] python2 33.py                                                      
Josh Freeman TB 15                                                                      
Josh Freeman TB 33                                                                      
Josh Freeman TB 21                                                                      
Josh Freeman TB 15

Notice that the team is outputted as pp.team instead of from pp.player. This means that team here is a property of the statistic itself rather than meta data about the player. For example, if you changed pp.team to pp.player.team, then it would say UNK instead because it is accessing current knowledge about the player.

Now, finally, we can get to your particular example. It is difficult because you are aggregating results over a season. A player doesn't necessarily have the same team over a season, so when you aggregate statistics, the pp.team field gets dropped. So let's try it:

import nfldb

db = nfldb.connect()
q = nfldb.Query(db)

q.game(season_year='2012', season_type='Regular')
q.sort('passing_yds').limit(10)

for pp in q.as_aggregate():
    print pp.player.full_name, pp.team, pp.passing_yds

And now the output:

[andrew@Liger nfldb] python2 33.py 
Drew Brees None 5177
Matthew Stafford None 4965
Tony Romo None 4903
Tom Brady None 4799
Matt Ryan None 4719
Peyton Manning None 4667
Andrew Luck None 4374
Aaron Rodgers None 4303
Josh Freeman None 4065
Carson Palmer None 4018

That's not very nice, so therefore, the example takes an easier path: it just shows you the team that the player currently belongs to.

If you'd like to tumble down the rabbit hole and fix this for real, then you need to find all teams that a player played for. There are lots of ways to do this, but basically, you'd want to find all the individual plays and accrue all unique teams in those plays for that player. For example, last year, Trent Richardson played on a couple teams. We could discover this by looking at the team field on all of his statistics:

import nfldb

db = nfldb.connect()
q = nfldb.Query(db)

q.game(season_year='2013', season_type='Regular')
q.player(full_name='Trent Richardson')

teams = set()
for pp in q.as_play_players():
    teams.add(pp.team)
print ', '.join(teams)

And that outputs:

[andrew@Liger nfldb] python2 33.py 
IND, CLE

Just as you'd expect.

iliketowel commented 10 years ago

Thanks again. I figured out the other issue I was having. Because I installed NFLGame first, I didn't use pip at the time. I did it on a different computer and after some minor issues (I'm finding issues with trying to install when using PostGRESQL 64 bit version), I was able to get this up and running.

For now I'm mostly playing with this data in the POSTGRESQL DB GUI, and because I work in visual analytics I'm trying to create fun and interesting visualizations out of the information. I may end up putting some of this stuff up on Tableau (and Tableau Public), but I'd like to confirm how/if you want to be credited.

For now, I added fields to confirm direction of the play, and whether it was in shotgun. I'm also working to see if I can figure out a way to see how players doing against starters in preseason (to see if there is any carryover), but that's difficult to do, because there's no real way to tell when starters exit pre-season games.

BurntSushi commented 10 years ago

Yes, please do put things up on Tableau and share them with us when you do. :-) (Opening a new issue or adding it to the wiki is perfectly acceptable.)

As far as credit goes... Having a shout out to the project (not me) is always appreciated. It helps increase awareness and hopefully attracts more folks. But of course, nfldb and associated projects are in the public domain like SQLite, so you could in theory copy the code, rebrand it, claim you wrote it and sell it, and it'd all be nice and legal. :-)