BurntSushi / nfldb

A library to manage and update NFL data in a relational database.
The Unlicense
1.08k stars 264 forks source link

LeSean McCoy Yards in 2013 200 short #24

Closed morefromalan closed 10 years ago

morefromalan commented 10 years ago

Love the DB. Spotted an issue:

If I import the DB and run this query:

SELECT player.full_name, SUM(play_player.rushing_yds) AS rushing_yds
FROM play_player
LEFT JOIN player ON player.player_id = play_player.player_id
LEFT JOIN game ON game.gsis_id = play_player.gsis_id
WHERE game.season_year = 2013 AND game.season_type = 'Regular' AND player.position = 'RB'
GROUP BY player.full_name
ORDER BY rushing_yds DESC;

Rushing yards look accurate vs ESPN/NFL records: http://espn.go.com/nfl/statistics/player/_/stat/rushing/sort/rushingYards

except for LeSean McCoy who is off by 200 yards.

-----------------------+-------------
 LeSean McCoy          |        1407
 Matt Forte            |        1320
 Jamaal Charles        |        1288
 Alfred Morris         |        1275
 Adrian Peterson       |        1266
 Marshawn Lynch        |        1257
 Ryan Mathews          |        1255
 Eddie Lacy            |        1162
 Frank Gore            |        1128
 DeMarco Murray        |        1095
 Chris Johnson         |        1077
 Knowshon Moreno       |        1039
 Zac Stacy             |         972
 C.J. Spiller          |         927
 Fred Jackson          |         896
 Reggie Bush           |         889
 Le'Veon Bell          |         857
 DeAngelo Williams     |         833
 Chris Ivory           |         833
 Maurice Jones-Drew    |         803
 Ben Tate              |         771
 LeGarrette Blount     |         768
 Stevan Ridley         |         767
 BenJarvus Green-Ellis |         756
 Rashad Jennings       |         733
 Lamar Miller          |         709
 Bilal Powell          |         697
 Gio Bernard           |         695
 Ray Rice              |         660
 Andre Ellington       |         652
 Trent Richardson      |         563
 Montee Ball           |         558
 Joique Bell           |         556
 Pierre Thomas         |         551
 Arian Foster          |         542
 Steven Jackson        |         542
 Donald Brown          |         537
 Andre Brown           |         492
 James Starks          |         491
 Doug Martin           |         456
 Bernard Pierce        |         436
 Danny Woodhead        |         425
 Mark Ingram           |         386
 Darren McFadden       |         379
 Kendall Hunter        |         358
BurntSushi commented 10 years ago

Ah wow! I was almost about to say, "Sorry, but the data is sometimes inaccurate." Instead, I dug a little bit. I'll share my process in case someone else finds it useful.

First, I wanted to reproduce it, so I ran nflstats LeSean McCoy (from nflcmd):

[andrew@Liger ~] nflstats LeSean McCoy
Player matched: LeSean McCoy (PHI, RB)
  Week  W/L    Date   OPP  R Att  R Yds  Y/Att  R TDs  WR Rec  WR Tar  WR Yds  Y/Att  WR TDs  F Lost  KR Yds  KR TDs  PR Yds  PR TDs
------------------------------------------------------------------------------------------------------------------------------------
     1    W   Sep 9  @WAS     31    184    5.9      1       1       1       5    5.0       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     2    L  Sep 15    SD     11     53    4.8      0       5       6     114   22.8       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     3    L  Sep 19    KC     20    158    7.9      1       0       1       0    0.0       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     4    L  Sep 29  @DEN     16     73    4.6      0       1       3      21   21.0       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     5    W   Oct 6  @NYG     20     46    2.3      1       6       8      46    7.7       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     6    W  Oct 13   @TB     25    116    4.6      0       2       2      55   27.5       0       1       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     7    L  Oct 20   DAL     18     55    3.1      0       6       7      26    4.3       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     8    L  Oct 27   NYG     15     48    3.2      0       4       5      18    4.5       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     9    W   Nov 3  @OAK     12     44    3.7      0       4       4      36    9.0       1       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
    10    W  Nov 10   @GB     25    155    6.2      0       1       1       6    6.0       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
    11    W  Nov 17   WAS     20     77    3.9      2       4       4      73   18.2       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
    13    W   Dec 1   ARI     19     79    4.2      0       5       6      36    7.2       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
    14    W   Dec 8   DET     29    217    7.5      2       1       1       4    4.0       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
    15    L  Dec 15  @MIN      8     38    4.8      0       5       7      68   13.6       0       0       0       0       0       0                                                           
------------------------------------------------------------------------------------------------------------------------------------                                                           
    16    W  Dec 22   CHI      6     37    6.2      1       2       2       9    4.5       0       0       0       0       0       0                                                           
------------------------------------------------------------------------------------------------------------------------------------                                                           
    17    W  Dec 29  @DAL      4     27    6.8      0       0       1       0    0.0       0       0       0       0       0       0                                                           
------------------------------------------------------------------------------------------------------------------------------------                                                           
     -    -       -     -    279   1407    5.0      8      47      59     517   11.0       1       1       0       0       0       0

Then I took a look at McCoy's game log, and indeed, weeks 16 and 17 are way off.

The next step is to investigate whether nflgame has the same error:

import nflgame

mccoy = nflgame.find('LeSean McCoy')[0]
stats = mccoy.stats(2013)
print stats.rushing_yds

Output: 1607.

Yikes! I'm not sure what happened, but maybe something happened while updating. (I keep it running while the games are playing for live updates.) Once the game is considered "finished", no more updates are made. So if the update failed while the game was playing---and only updated for the first quarter, for example---then nfldb would eventually consider that game finished. Therefore, nfldb-update wouldn't try to update it again.

Luckily, the fix is simple. Just blow away week 16 and week 17 from the database and rebuild it with nfldb-update:

[andrew@Liger ~] psql nfldb -c "DELETE FROM game WHERE (season_year, season_type) = (2013, 'Regular') AND week IN (16, 17);"
DELETE 32
[andrew@Liger ~] nfldb-update
...

(N.B. This will automatically delete all drives, plays and stats associated with those games because of cascading foreign key constraints.)

And now nflstats LeSean McCoy shows the correct output:

[andrew@Liger ~] nflstats LeSean McCoy
Player matched: LeSean McCoy (PHI, RB)
  Week  W/L    Date   OPP  R Att  R Yds  Y/Att  R TDs  WR Rec  WR Tar  WR Yds  Y/Att  WR TDs  F Lost  KR Yds  KR TDs  PR Yds  PR TDs
------------------------------------------------------------------------------------------------------------------------------------
     1    W   Sep 9  @WAS     31    184    5.9      1       1       1       5    5.0       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     2    L  Sep 15    SD     11     53    4.8      0       5       6     114   22.8       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     3    L  Sep 19    KC     20    158    7.9      1       0       1       0    0.0       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     4    L  Sep 29  @DEN     16     73    4.6      0       1       3      21   21.0       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     5    W   Oct 6  @NYG     20     46    2.3      1       6       8      46    7.7       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     6    W  Oct 13   @TB     25    116    4.6      0       2       2      55   27.5       0       1       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     7    L  Oct 20   DAL     18     55    3.1      0       6       7      26    4.3       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     8    L  Oct 27   NYG     15     48    3.2      0       4       5      18    4.5       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     9    W   Nov 3  @OAK     12     44    3.7      0       4       4      36    9.0       1       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
    10    W  Nov 10   @GB     25    155    6.2      0       1       1       6    6.0       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
    11    W  Nov 17   WAS     20     77    3.9      2       4       4      73   18.2       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
    13    W   Dec 1   ARI     19     79    4.2      0       5       6      36    7.2       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
    14    W   Dec 8   DET     29    217    7.5      2       1       1       4    4.0       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
    15    L  Dec 15  @MIN      8     38    4.8      0       5       7      68   13.6       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
    16    W  Dec 22   CHI     18    133    7.4      2       6       6      29    4.8       0       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
    17    W  Dec 29  @DAL     27    131    4.9      0       1       2       3    3.0       1       0       0       0       0       0
------------------------------------------------------------------------------------------------------------------------------------
     -    -       -     -    314   1607    5.1      9      52      64     540   10.4       2       1       0       0       0       0
BurntSushi commented 10 years ago

I've updated the database dump so that future builds will have the correct data.

Thank you!

morefromalan commented 10 years ago

Thanks @BurntSushi !