BurntSushi / nfldb

A library to manage and update NFL data in a relational database.
The Unlicense
1.08k stars 264 forks source link

Aggregated Data Mismatch? #46

Open aakar opened 10 years ago

aakar commented 10 years ago

I'm trying to get the total passing yards and rushing yards of teams using the agg_play table and manual SQL. Noticing that my summation isn't giving the answers that match the NFL.com DB.

SELECT sum(a.passing_tds) as TD, sum(a.passing_cmp) as PassComp, sum(a.passing_yds) as 
PassYds, sum(a.rushing_yds) as RushYds from agg_play a, drive g, game m where a.drive_id = 
g.drive_id and g.gsis_id = a.gsis_id and g.pos_team = 'NO' and m.gsis_id = g.gsis_id and 
m.season_type = 'Regular' and m.season_year = 2013;

I get the following:

td passcomp passyds rushyds 39 444 5139 1476

NFL.com though has the following passcomp: 446 passing yds: 4918 rushing yds: 1473

Noticing this with other teams too. I remember reading in the docs that it's likely that some stats may or may not match if they get adjusted by the NFL after the fact, just wondering if that's the case here.

BurntSushi commented 10 years ago

It's not that stats get adjusted, it's that the JSON feed can just be inaccurate. Usually it's a result of missing data, so that aggregating usually results in underestimates. That you have overestimates is strange. It warrants more investigation.

Investigation is tedious, because it means pinpointing more precisely where the errors are coming from. Usually they are spread out over many games, but sometimes you'll find a pocket of errors for one game.