Open brbeaird opened 6 years ago
I have a few other examples of this I noticed as well. When i get back to a computer i will try and get examples from the json.
There were a lot of fumbles being counted twice, including: Christian McCaffrey, Alex Collins, Ben Roethlisberger, Dalvin Cook, Alfred Morris, Dak Prescott, Trubisky. Looking in nfldb - play_player table their respective drives actually have 2 fumbles counted. I have not yet had a chance to look into it
player_id | gsis_id | play_id | fumbles_lost | team ------------+------------+---------+--------------+------ 00-0033869 | 2018090912 | 3871 | 2 | CHI 00-0033077 | 2018090910 | 3931 | 2 | DAL 00-0033280 | 2018090910 | 443 | 2 | CAR 00-0022924 | 2018090901 | 3867 | 2 | PIT 00-0030433 | 2018090906 | 1730 | 2 | NO 00-0033893 | 2018090904 | 1163 | 2 | MIN 00-0029141 | 2018090904 | 1590 | 2 | SF
This is interesting, and an issue I have not previously seen, but I can confirm that I saw this with the Minnesota/San Francisco game. It was pointed out to me that my data showed that the Vikings had 2 defensive fumble recoveries & 4 sacks when in reality they only had a single fumble recovery and 3 sacks.
Sure, enough, there were records being duplicated in a single play.
For example, for this play:
(SF, MIN 1, Q2, 2 and 1) (2:30) A.Morris up the middle to MIN 1 for no gain (L.Joseph). FUMBLES (L.Joseph), RECOVERED by MIN-H.Smith at MIN 2. H.Smith to MIN 2 for no gain (P.Garcon).
The raw data under that play showed:
{
u'ydstogo': 1,
u'note': u'FUMBLE',
u'qtr': 2,
u'yrdln': u'MIN 1',
u'sp': 0,
u'down': 2,
u'players': {
u'00-0029606': [
{u'playerName': u'H.Smith', u'clubcode': u'MIN', u'yards': 1.0, u'statId': 59, u'sequence': 11},
{u'playerName': u'H.Smith', u'clubcode': u'MIN', u'yards': 0.0, u'statId': 59, u'sequence': 15}],
u'00-0027914': [
{u'playerName': u'A.Sendejo', u'clubcode': u'MIN', u'yards': 0.0, u'statId': 79, u'sequence': 3},
{u'playerName': u'A.Sendejo', u'clubcode': u'MIN', u'yards': 0.0, u'statId': 79, u'sequence': 5},
{u'playerName': u'A.Sendejo', u'clubcode': u'MIN', u'yards': 0.0, u'statId': 91, u'sequence': 9},
{u'playerName': u'A.Sendejo', u'clubcode': u'MIN', u'yards': 0.0, u'statId': 91, u'sequence': 13}],
u'00-0026345': [
{u'playerName': u'P.Garcon', u'clubcode': u'SF', u'yards': 0.0, u'statId': 79, u'sequence': 14},
{u'playerName': u'P.Garcon', u'clubcode': u'SF', u'yards': 0.0, u'statId': 79, u'sequence': 16}],
u'00-0029141': [
{u'playerName': u'A.Morris', u'clubcode': u'SF', u'yards': -1.0, u'statId': 10, u'sequence': 1},
{u'playerName': u'A.Morris', u'clubcode': u'SF', u'yards': 0.0, u'statId': 52, u'sequence': 4},
{u'playerName': u'A.Morris', u'clubcode': u'SF', u'yards': 0.0, u'statId': 52, u'sequence': 7},
{u'playerName': u'A.Morris', u'clubcode': u'SF', u'yards': 0.0, u'statId': 106, u'sequence': 8},
{u'playerName': u'A.Morris', u'clubcode': u'SF', u'yards': 0.0, u'statId': 106, u'sequence': 10}],
u'0': [
{u'playerName': None, u'clubcode': u'SF', u'yards': -1.0, u'statId': 95, u'sequence': 2}],
u'00-0027885': [
{u'playerName': u'L.Joseph', u'clubcode': u'MIN', u'yards': 0.0, u'statId': 79, u'sequence': 6},
{u'playerName': u'L.Joseph', u'clubcode': u'MIN', u'yards': 0.0, u'statId': 91, u'sequence': 12}]},
u'time': u'02:30',
u'ydsnet': 69,
u'posteam': u'SF',
u'desc': u'(2:30) A.Morris up the middle to MIN 1 for no gain (L.Joseph). FUMBLES (L.Joseph), RECOVERED by MIN-H.Smith at MIN 2. H.Smith to MIN 2 for no gain (P.Garcon).'
}
There are just too many sequences listed for the actions that went into that play.
Deleting the game's .json.gz
file and regenerating produced a different - cleaner - result.
{
u'ydstogo': 1, u'note': u'FUMBLE', u'qtr': 2, u'yrdln': u'MIN 1', u'sp': 0, u'down': 2,
u'players': {
u'00-0029606': [
{u'playerName': u'H.Smith', u'clubcode': u'MIN', u'yards': 0.0, u'statId': 59, u'sequence': 7}],
u'0': [
{u'playerName': None, u'clubcode': u'SF', u'yards': -1.0, u'statId': 95, u'sequence': 2}],
u'00-0026345': [
{u'playerName': u'P.Garcon', u'clubcode': u'SF', u'yards': 0.0, u'statId': 79, u'sequence': 8}],
u'00-0029141': [
{u'playerName': u'A.Morris', u'clubcode': u'SF', u'yards': -1.0, u'statId': 10, u'sequence': 1},
{u'playerName': u'A.Morris', u'clubcode': u'SF', u'yards': 0.0, u'statId': 52, u'sequence': 4},
{u'playerName': u'A.Morris', u'clubcode': u'SF', u'yards': 0.0, u'statId': 106, u'sequence': 5}],
u'00-0027885': [
{u'playerName': u'L.Joseph', u'clubcode': u'MIN', u'yards': 0.0, u'statId': 79, u'sequence': 3},
{u'playerName': u'L.Joseph', u'clubcode': u'MIN', u'yards': 0.0, u'statId': 91, u'sequence': 6}]},
u'time': u'02:30', u'ydsnet': 69, u'posteam': u'SF', u'desc': u'(2:30) A.Morris up the middle to MIN 1 for no gain (L.Joseph). FUMBLES (L.Joseph), RECOVERED by MIN-H.Smith at MIN 2. H.Smith to MIN 2 for no gain (P.Garcon).'
}
All of A.Sandejo's contributions to the play are now gone, and the redundant statId
s have vanished. Maybe he was incorrectly being credited with the tackle & forced fumble and during the course of the play-by-play being updated the erroneous info and corrected info coexisted for a time? At some point the data might have been cleaned up, but if it was after the time that the game got completed, nflgame
would not see if because it believes the game to be over (no need to update a completed game).
This was not an isolated case as 4 defenses were pointed out to me as having wrong stats. Deleting all of the games from the week and then re-generating the data resolved the discrepancies in all cases.
So is deleting all .json.gz for the weekend the best way to delete those and then regenerate? And do you know if there is a separate step for cleaning up nfldb?
In Windows I just went explorer \my\path\to\python\lib\site-packages\nflgame\gamecenter-json
, sorted by name, and deleted everything from 2018090600.json
through 2018091001.json
Yes, I see that nfldb
retains the "bad" Sandejo information despite nflgame
being now accurate
# nflgame:
(SF, MIN 1, Q2, 2 and 1) (2:30) A.Morris up the middle to MIN 1 for no gain (L.Joseph). FUMBLES (L.Joseph), RECOVERED by MIN-H.Smith at MIN 2. H.Smith to MIN 2 for no gain (P.Garcon).
H.Smith OrderedDict([('defense_frec_yds', 0), ('defense_frec', 1)])
P.Garcon OrderedDict([('defense_tkl', 1)])
A.Morris OrderedDict([('rushing_att', 1), ('rushing_yds', -1), ('fumbles_tot', 1), ('fumbles_forced', 1), ('fumbles_lost', 1)])
L.Joseph OrderedDict([('defense_tkl', 1), ('defense_ffum', 1)])
# nfldb:
(SF, OPP 1, Q2, 2 and 1) (2:30) A.Morris up the middle to MIN 1 for no gain (L.Joseph). FUMBLES (L.Joseph), RECOVERED by MIN-H.Smith at MIN 2. H.Smith to MIN 2 for no gain (P.Garcon).
Linval Joseph (MIN, DT) {'defense_tkl': 1, 'defense_ffum': 1}
Alfred Morris (SF, RB) {'fumbles_tot': 2, 'fumbles_forced': 2, 'rushing_att': 1, 'rushing_yds': -1, 'fumbles_lost': 2}
Pierre Garcon (SF, WR) {'defense_tkl': 2}
Andrew Sendejo (MIN, DB) {'defense_tkl': 2, 'defense_ffum': 2}
Harrison Smith (MIN, DB) {'defense_frec_yds': 1, 'defense_frec': 2}
I don't really know the best way to delete the games from nfldb
. You can always access the database itself, but not super convenient, especially if you've not done it before.
One thing you can do is run this little script, and then run nfldb-update
.
import nfldb
db = nfldb.connect()
# gsis_ids = [2018090600, 2018090900, 2018090901, 2018090902, 2018090903,
# 2018090904, 2018090905, 2018090906, 2018090907, 2018090908,
# 2018090909, 2018090910, 2018090911, 2018090912, 2018091000,
# 2018091001]
q = nfldb.Query(db)
q.game(season_year=2018, season_type='Regular', week=1)
gsis_ids = [game.gsis_id for game in q.as_games()]
for gsis_id in gsis_ids:
query = "DELETE FROM game where gsis_id = '{}';".format(gsis_id)
with nfldb.Tx(db) as cursor:
cursor.execute(query)
That will fix nfldb
(SF, OPP 1, Q2, 2 and 1) (2:30) A.Morris up the middle to MIN 1 for no gain (L.Joseph). FUMBLES (L.Joseph), RECOVERED by MIN-H.Smith at MIN 2. H.Smith to MIN 2 for no gain (P.Garcon).
Linval Joseph (MIN, DT) {'defense_tkl': 1, 'defense_ffum': 1}
Alfred Morris (SF, RB) {'fumbles_tot': 1, 'fumbles_forced': 1, 'rushing_att': 1, 'rushing_yds': -1, 'fumbles_lost': 1}
Pierre Garcon (SF, WR) {'defense_tkl': 1}
Harrison Smith (MIN, DB) {'defense_frec': 1}
If this is a very rare thing, then maybe it's not worth it but these resources do send the Last-Modified
HTTP header so it's possible to have a mechanism checking for changes after a game is finished by sending HEAD requests. Where to store that info (the timestamp) I'm not sure but that would probably be the cheapest way.
Seems like its worth addressing to me.
Reading this Last-Modified
seems elegant enough to me! Why not just store that directly on the game json?
It seems pretty low impact to me...
I noticed this happened again in last night's BAL/CIN game. CIN was being credited with an extra INT, 2 extra fumble recoveries, and 1.5 extra sacks in the play-by-play that was available when the last play of the game occurred. At some point between last night and this morning that data was fixed and deleting my local .json and regenerating fixed the discrepancy.
On Thu, Sep 13, 2018, 2:46 PM Derek notifications@github.com wrote:
Seems like its worth addressing to me.
Reading this Last-Modified seems elegant enough to me! Why not just store that directly on the game json?
It seems pretty low impact to me...
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/derek-adair/nflgame/issues/34#issuecomment-421129881, or mute the thread https://github.com/notifications/unsubscribe-auth/AC237YTYro3o9cZVR_HoQdf_x7v3lkn9ks5uarYUgaJpZM4WjoRp .
If this is going to happen every week, I'd say it's a fairly significant problem.
Can this be renamed, "Play data doesn't update after game closes?"
Just want to make sure i am clear on the issue here.
Yeah that makes sense.
On Fri, Sep 14, 2018 at 5:21 PM Derek notifications@github.com wrote:
Can this be renamed, "Play data doesn't update after game closes?"
Just want to make sure i am clear on the issue here.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/derek-adair/nflgame/issues/34#issuecomment-421499384, or mute the thread https://github.com/notifications/unsubscribe-auth/AG52aDUkPxeIs7cDIHUgsDpqXBQIq_-Bks5ubCv4gaJpZM4WjoRp .
Storing the timestamp in the game-json would indeed be low impact. What needs to be handled in that case is support for reading both (older) files without the timestamp and files with the timestamp. Easy to fix.
What needs to be decided is when/how long to check for updates of games. For instance, we don't need to check the correctness of a game played in 2016. Games of current week, x weeks back...?
So the proposed logic would be to check Last-Modified
with HEAD requests, and if a resource has changed, perform a GET request and replace the game file.
Hmmmm very good point! I need to look at how this is done to provide a more informed opinion.... but perhaps an easy/non-obtrusive way would be to only check if the game occured within 2 weeks.
IMO the way to ensure the most accuracy would be to check EVERY game or check if the game occurred in the given season... But i'm unsure if this would result in too much traffic that the NFL may crack down.
Actually we could get away with simply changing the logic and instead of checking / updating currently playing games... we update all games in the past two weeks. I'm not so sure tracking last-modified header does much if we need to keep track of how recntly the game was played anyways.
If I'm understanding correctly, on Monday nights, everytime someone accesses the API we will be dumping redownloading ~31 games worth of data instead of the one game being played? I hit the API roughly every 3 minutes (and I think nfldb is every 30 seconds). That seems like a lot of unnecessary calls for data that is almost certainly static. I think that might be a bit too lazy of a solution here.
In practice, after analyzing how much traffic that goes just by opening NFL.com, seeing how many page visits they have every month, I don't think that we really need to worry about calling them too often (again, in practice). With that said, I think it's great that we are cautious and try to be smart. Not only does that decrease the number of calls to the NFL, it makes for a better application that runs smoother.
I'm not so sure tracking last-modified header does much if we need to keep track of how recntly the game was played anyways.
We have the schedule on file/in memory. Getting entries from a certain relative time range is an easy operation so no worries there.
Come to think of it, we could make use of an extra flag like verified
or something like that. That is, if a game is checked at earliest x days (5 days..?) after it was played, it's pretty safe to say that we won't need to check that anymore. Hence, we can flag it verified
. Maybe check (with HEAD) about once a day up "until x".
A user that runs the application pretty much all the time would have calls scattered over time, whereas a user that hasn't run the application in say three weeks will have to download everything anyway and all games older than (today - x) will automatically be verified
.
For the historical data, it's not impossible that we have errors so one could of course verify these as well if we wanted to (preferably a single user that commits all changes to the repo).
I wonder if we should make this thing configurable, ya know?
VERIFY_POST_GAME=(False|interval)
or perhaps a manual call. Like... some kind of force update function.
Again, i'm not 100% sure on how the player stats are updated, but ASSUMING it's done in a batch process of games that are active and removed from the dictionary of games to update once they are complete.
_VERIFY_POSTGAME could be checked whenever we look up if a game is active or not. If the game was marked active w/in the interval we continue checking this.
I believe this would prevent those extraneous calls @ochawkeye mentioned.
Manually deleting the .json
-file and redownloading is of course not a good long-term solution. Hence doing it programatically (e.g. force-updating via command) is a good idea (I thought that was possible already but maybe it isn't).
@derek-adair not sure if I grasp the difference between your (last) suggestion and my suggestion.
@ochawkeye and @brbeaird, did you notice any wrong data this week? Also, is this the first season you have noticed this? My point being that this might be something going on at NFL.com just right now, surely they're aware of the problem and people will have pointed this out (if it's wrong here, it's wrong on NFL.com). So I'm guessing that this is a high priority bug. Probably solved in the near future. Another option is of course that they abandon these feeds completely (they are old and feed very old-looking and pretty poor functioning web content on nfl.com) in favor of newer ones in which case we have bigger problems.
If this is a one-time-in-six-years (?) thing and it's back to normal now, then I think @derek-adair's suggestion of a configurable update-function is nice (force update command that can be run manually as desired as well as periodically updating games if the user wished to have such a setup)
Yes, this is a data problem that occurred again this week. Just looking at defense's accumulated stats, I saw changes in the following after deleting and regenerating stats:
Baltimore: 4 sacks > 3 sacks Buffalo: 4 fumble recoveries > 2 Carolina: 2 interceptions > 1 Chicago: 3 fumble recoveries > 2 Cincinnati: 2 sacks > 1 Cleveland: 3 interceptions > 2, 3 sacks > 2 Houston: 5 sacks > 4 Jacksonville: 1 fumble recovery > 0 (this one seems weird), 4 sacks > 3 Kansas City: 5 sacks > 4 Minnestoa: 4 sacks > 3 New York Jets: 5 sacks > 4 Seattle: 3 interceptions > 2
This is not a problem that I ever witnessed before this season.
In regards to being a high priority bug, I'm not so certain on that one. Just because the underlying JSON data shows the redundant play information, that doesn't mean that their game center is displaying all of that information. I haven't watched the game's live in the game center, but it's probable that their displaying of the information is handling the redundancies properly.
Yep, this is the first season I've seen this, and I looked at things pretty closely last year.
It does seem like a bug, but seeing as how the NFL feed eventually gets corrected, that may just be how they've got it working now. I have no idea what's going on behind the scenes there, but it definitely feels like something we'll end up having to deal with to make sure we go back and get the corrected data at some point.
Yep we can't know if it's high prio @ nfl or not so I guess we'd have to assume that it's not.
Right, so I don't mind coding fixes (that's kinda why I'm around, I don't actually use the project more than for inspiration/as a knowledge source) but the question is how. How do you guys feel about the suggestions given so far?
Regardless of any redundancies there will be corrections after the game is no longer active. The doubly counted plays may or may not be corrected post-game-closing. I'm all about the VERIFY_POST_GAME config option.
Also note that this project is basically running itself so "high priority" is a relative term.
"High Priority" was in the context of whether or not NFL.com viewed this as a bug in their source data, not a grade of how quickly nflgame
should be addressing it.
Storing the last pulled information in the JSON would be fine, but even that might not be necessary. All comes down to how long after the game the data ends up being corrected.
I've found if a change is going to happen it happens by the morning after. But I have not compared stats from any other site to see if they match what we are recording so I can't say with 100% certainty that after the morning after no changes occur anymore. If we could collectively agree that it looks like the morning after is the cutoff for changes, then we could probably just tap into the metadata for the .json.gz file itself. If the file was last modified on the day of the game that happened in the past, then delete it and pull it again. If it is newer than the game date then call it good.
As mentioned, these post-game changes are not something we have seen previously. Each week the NFL has a handful of plays they reclassify based on further analysis. For example, there might be a pass play that, after further tape review, gets classified as a running play instead. In that case, they publish a list of statistical changes for the play (ie. Drew Brees: -1 completion, -1 pass attempt, -6 passing yards; Mark Ingram: -1 reception, -6 receiving yards, +1 rushing attempt, +6 rushing yards). But those stat changes have never been retroactively applied to the play-by-play data. So these post-game changes are all pretty new behavior for those of us that have been using nflgame
for years.
I know with most Fantasy sites, stat corrections usually go through on Wed or Thurs morning for the previous week's games; I'm not sure the longest possible time NFL could make changes after the fact, though.
On Mon, Sep 24, 2018 at 11:49 AM ochawkeye notifications@github.com wrote:
Storing the last pulled information in the JSON would be fine, but even that might not be necessary. All comes down to how long after the game the data ends up being corrected.
I've found if a change is going to happen it happens by the morning after. But I have not compared stats from any other site to see if they match what we are recording so I can't say with 100% certainty that after the morning after no changes occur anymore. If we could collectively agree that it looks like the morning after is the cutoff for changes, then we could probably just tap into the metadata for the .json.gz file itself. If the file was last modified on the day of the game that happened in the past, then delete it and pull it again. If it is newer than the game date then call it good.
As mentioned, these post-game changes are not something we have seen previously. Each week the NFL has a handful of plays they reclassify based on further analysis. For example, there might be a pass play that, after further tape review, gets classified as a running play instead. In that case, they publish a list of statistical changes for the play (ie. Drew Brees: -1 completion, -1 pass attempt, -6 passing yards; Mark Ingram: -1 reception, -6 receiving yards, +1 rushing attempt, +6 rushing yards). But those stat changes have never been retroactively applied to the play-by-play data. So these post-game changes are all pretty new behavior for those of us that have been using nflgame for years.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/derek-adair/nflgame/issues/34#issuecomment-424045107, or mute the thread https://github.com/notifications/unsubscribe-auth/AG52aKa0Zw5RZiDCwgquag1VIpfD2oxiks5ueQ0LgaJpZM4WjoRp .
So after some thinking...
Storing the Last-Modified
-value in the json
-files isn't really necessary since we can use the timestamps of the files themselves for the same purpose, they would give the same info pretty much in the application's point of view.
Also, each time we check a remote game file on nfl.com (via HEAD
) we can update the timestamp of the corresponding local file even if the game has not changed, e.g. even if there's no download, but just to indicate that this game has been checked at this very timestamp. So after checking games postgame for x number of days x times per day, this would kind of work like a "verified-flag". E.g. if the local modified-timestamp is past x days from game day or whatever (or x days/hours from the assumed correction point, like wed or thu morning) then that would be considered a verified game, we don't need to check anymore.
No tampering with the json-files, they're all just copies of the nfl.com data. I kinda like that approach.
What we haven't touched is where this should happen, e.g. where in the code. We've discussed this in prior issues. "Hi-jacking" import nflgame
like with the schedule?
EDIT: Making this configurable - I can't really decide if I think this should be configurable or not. I kinda like the idea that the application doesn't just download stuff from the internet without the user really knowing it. On the other hand, if this is not enabled by default, then a high percentage of the users will not know about it/how to enable it = file issues about it and not "enjoy the product" as it can be enjoyed.
What we haven't touched is where this should happen, e.g. where in the code. We've discussed this in prior issues. "Hi-jacking" import nflgame like with the schedule?
Main reason I don't like that is that it is possible to run nflgame without ever accessing any of the games. The following is perfectly valid code.
import nflgame
print len(nflgame.players)
8631
Adding the overhead of checking every single game (thousands) against when it appears in the schedule seems overly aggressive.
Instead of determining if we need to replace the .JSON, what if we just delay caching the .JSON until some point after the game has concluded?
https://github.com/derek-adair/nflgame/blob/master/nflgame/game.py#L307-L309
It would mean pulling the same data from nfl.com over and over again until we hit that magic date/time when we say we're satisfied that changes won't be happening anymore.
Just throwing it out there, not suggesting it is the best solution to the problem.
Main reason I don't like that is that it is possible to run nflgame without ever accessing any of the games. The following is perfectly valid code.
I don't like it either, in fact I think it's poor design. I don't think that import nflgame
should do anything but import the lib. I think that all external as well as internal calls should be initiated by the user. Using the live
-module would be such a thing, the user makes an active choice.
I've just gotten the impression that we are OK with (like?) that for instance the schedule is automatically updated with the import so I've just kinda followed that philosophy. Personally I don't like it but I can live with it if it's what the team wishes :slightly_smiling_face:
Adding the overhead of checking every single game (thousands) against when it appears in the schedule seems overly aggressive.
Unclear of me, I was thinking maybe the last x week(s) but didn't write it down. A user initiated function call/script could override this to make checks earlier in time.
Instead of determining if we need to replace the .JSON, what if we just delay caching the .JSON until some point after the game has concluded?
Best solution so far I think. This is easy to fix for now and solves the problem. Well done! The only thing I'm not sure about is if all games get synced at the same time. Like TNF vs MNF...so would it be a set time (like, thursdays at 8AM) or a relative time (like 24 hours after kickoff).
Marking on hold b/c #46 will address this if we are lucky
Also yes, updating should not be done on import and only when you run the live script should it update.
I've come to see the error in my thinking and understanding of the code base.
@brbeaird I've updated nfldb to use python3 and have a compatible database w/ (hopefully) all of the 2019 data.
/feeds-rs/playbyplay/ has data back to 1998. We can use this to verify the accuracy of our play data. I will be writing a script to do this.
I'm not sure the best way to invoke this in real time to ensure accuracy. I'm considering adding config for stuff like this where you set your preference as a user:
config.ini example:
[nflgame]
# Period in which to check the game json
verify_play_data = never[default] | hourly | daily | weekly | monthly
# update the schedule on import
update_schedule_on_import= false[default] | true
@brbeaird I've updated nfldb to use python3 and have a compatible database w/ (hopefully) all of the 2019 data.
Ooh nice. I’ll check that out when I get a chance. Eventually want to get that in a docker.
Looking over the data from 2018 week 1, I'm seeing some things that don't quite line up. Check out play 3673 in 2018091001.json. This play was a 2 yard gain for Todd Gurley. However, he is listed further down in one of the sequence nodes with 32 yards.
While this may not be visible in nflgame, that data flows into nfldb as incorrect data, so if you're trying to do any aggregate queries there, they will not be correct.
Any ideas here?