dynastyprocess / data

An open-data fantasy football repository, maintained by DynastyProcess.com
https://dynastyprocess.com
GNU General Public License v3.0
73 stars 19 forks source link

Duplicate gsis_id in load_ff_playerids #34

Open nickpietrantonio opened 2 years ago

nickpietrantonio commented 2 years ago

In load_ff_playerids, there are four cases of duplicate gsis_id's. The one I have researched the most is two rows where gsis_id == '00-0016098', both listed as Fred Taylor, with different birthdays and some other data.

I cant find almost any information on the 2005 Fred Taylor (like this table says drafted 32nd pick of the 7th round, but was not according to the 2005 draft I looked at). The one thing I found using the mfl_id listed was this on MFL that matches the given information (like the incorrect draft spot): https://www72.myfantasyleague.com/2000/player?L=0&P=8058. Though appears I got lucky randomly throwing in 2000 as the year while seeing if you could look up players by id this way on MFL as its the only one I was able to get it to show up at -4 years of experience.

I don't know how this list is maintained and if its done automatically so the above site existing means this row will exist, but thought I'd bring attention to it. And figured at the very least the 00-0016098 gsis_id probably shouldn't belong to this row.

The other duplicates are for 00-0019641, 00-0020270, and 00-0029435 Maybe not super useful, but a reproducible list of the duplicates:

nflreadr::load_ff_playerids() %>% 
    filter(!is.na(gsis_id)) %>% 
    group_by(gsis_id) %>% 
    summarize(n = n()) %>% 
    filter(n > 1)

00-0019641: I believe the incorrect one is the 1990 draft year row 00-0020270: I believe the incorrect one is the 1985 draft year row Like the Fred Taylor one, both above have less info for that row and doesn't show up in the year it says they were drafted

00-0029435 is the only one of these that it seems that both rows are real players. Both have PFR pages, but I think that they were incorrectly given the same gsis_id and pff_id. I believe the 2013 draft_year row should be gsis_id '00-0030236' based on there being rushing play by play data for a RB Dennis Johnson in 2013 with that gsis_id