BurntSushi / nfldb

A library to manage and update NFL data in a relational database.
The Unlicense
1.07k stars 263 forks source link

Running nfldb-update gives an error on database load #247

Open illegalnumbers opened 6 years ago

illegalnumbers commented 6 years ago

Hey there! Thanks for putting together this software package. Screen scraping is an undervalued datasource that is super time consuming so mad props!

When running nfdb-update from my root project dir using nfldb and a couple other dependencies (listed below is my requirements.txt file) and using Python 2.7 I get the following error and stack trace:

Traceback (most recent call last):
  File "/Users/bytenel/workplace/fantasy-football/bin/nfldb-update", line 39, in <module>
    nfldb.update.run(**vars(args))
  File "/Users/bytenel/workplace/fantasy-football/lib/python2.7/site-packages/nfldb/update.py", line 535, in run
    doit()
  File "/Users/bytenel/workplace/fantasy-football/lib/python2.7/site-packages/nfldb/update.py", line 525, in doit
    update_games(db, batch_size=batch_size)
  File "/Users/bytenel/workplace/fantasy-football/lib/python2.7/site-packages/nfldb/update.py", line 397, in update_games
    bulk_insert_game_data(cursor, scheduled, batch_size=batch_size)
  File "/Users/bytenel/workplace/fantasy-football/lib/python2.7/site-packages/nfldb/update.py", line 195, in bulk_insert_game_data
    do()
  File "/Users/bytenel/workplace/fantasy-football/lib/python2.7/site-packages/nfldb/update.py", line 188, in do
    nfldb.db._big_insert(cursor, table, bulk[table])
  File "/Users/bytenel/workplace/fantasy-football/lib/python2.7/site-packages/nfldb/db.py", line 356, in _big_insert
    % (table, insert_fields, values))
  File "/Users/bytenel/workplace/fantasy-football/lib/python2.7/site-packages/psycopg2/extras.py", line 249, in execute
    return super(RealDictCursor, self).execute(query, vars)
psycopg2.IntegrityError: insert or update on table "play" violates foreign key constraint "play_pos_team_fkey"
DETAIL:  Key (pos_team)=(JAX) is not present in table "team".

Is there something I'm doing wrong to update my database? My config.ini is listed below as well. I also got a stack trace earlier when things were running that is the following:


There were some errors during the download. Usually this is a
result of an HTTP request timing out, which means the
resulting "players.json" file is probably missing some data.
An appropriate solution is to re-run the script until there
are no more errors (or when the errors are problems on
NFL.com side.)
-------------------------------------------------------------------------------
Could not get player info from roster row:

<tr class="even">
<td>72</td>
<td style="text-align:left"> <a href="/player/charleslenojr./2550169/profile">Leno, Charles, Jr.</a></td>
<td>T</td>
<td>ACT</td>
<td> 6'3"</td>
<td>305</td>
<td>10/9/1991</td>
<td>4</td>
<td>Boise State</td></tr>

Exception:

Traceback (most recent call last):
  File "/Users/bytenel/workplace/fantasy-football/lib/python2.7/site-packages/nflgame/update_players.py", line 419, in run
    roster.append(meta_from_soup_row(team, row))
  File "/Users/bytenel/workplace/fantasy-football/lib/python2.7/site-packages/nflgame/update_players.py", line 179, in meta_from_soup_row
    last_name, first_name = map(lambda s: s.strip(), name.split(','))
ValueError: too many values to unpack

I'm not sure if that's related. It looked like 32/32 teams rosters were downloaded properly? The view that is created is just having some sort of schema issue. I also got some "Could not get GSIS id at" errors as well but I don't think that is related.

Config.ini

[pgsql]
# The time zone to use in the PostgreSQL session by default.
# You can see a list of available time zone names with the following query:
#
#     select * from pg_timezone_names
#
# Here are some common ones:
#
#    US/Alaska
#    US/Samoa
#    US/Hawaii
#    US/Mountain
#    US/Indiana-Starke
#    US/Pacific
#    US/Central
#    US/Michigan
#    US/Aleutian
#    US/East-Indiana
#    US/Arizona
#    US/Eastern
#    US/Pacific-New
#
# Datetimes are always stored as UTC, but this time zone setting
# can be used to change how datetimes are returned in SELECT queries.
timezone = US/Mountain

# The name of the PostgreSQL database containing NFL data.
database = nfldb

# The username and password for the database specified above.
# The user should have permission to add, delete and remove data.
# Also, the user should have permission to create/alter/drop tables,
# indexes and types.
user = nfldb
password = <redacted>

# The host of the PostgreSQL server. When using a local database, this should
# be set to "localhost".
host = localhost

# The port that the PostgreSQL server is running on. It is set to the
# default port of 5432.
port = 5432

requirements.txt

nflgame
nfldb
nflvid
ochawkeye commented 6 years ago

JAX issue: see here and here Too many commas in name issue: see here. There was a pull request in nflgame to resolve this that would need to be pulled to permanently fix this. The NFL tends to clean up these ", Jr." cases, but a fix on the nflgame side does seem to be in order.

illegalnumbers commented 6 years ago

Word rerunning now after that. Thanks for the tip. Any way I could submit a PR for fixing the jax issue? Should just be an alias for the team abrev's somewhere right? Also, is there something I can do to move that PR along for the update on double jr's? Might just make sense to trash records that crash things (anything malformed, etc). It's a pretty common data intake strategy.

illegalnumbers commented 6 years ago

Looks like it didn't error on load with JAX fix - I'd like to contribute a patch for that if I can just point me in the right direction