db_playerids.csv missing Sleeper IDs - contributing?

dynastyprocess / data

An open-data fantasy football repository, maintained by DynastyProcess.com

https://dynastyprocess.com

GNU General Public License v3.0

73 stars 19 forks source link

db_playerids.csv missing Sleeper IDs - contributing? #7

Closed danabrey closed 4 years ago

danabrey commented 4 years ago

I'm not sure how you collate the player IDs, but I've got some contributions to make if that's possible:

Chris Herndon seems to have duplicate IDs in Sleeper's database: 5009 and 5755 - 5755 is a free agent, 5009 looks to be the real Chris Herndon

Anthony Gordon (MFL ID 14787) missing Sleeper ID: 6898

Marquez Callaway (MFL ID 15034) missing Sleeper ID: 6989

Mike Warren (MFL ID 14816) missing Sleeper ID: 6992

Benny Snell (MFL ID 14072) missing Sleeper ID: 6156

Quartney Davis (MFL ID 14856) missing Sleeper ID: 6879

Salvon Ahmen (MFL ID 14811) missing Sleeper ID: 6918

Jeff Thomas (MFL ID 14866) missing Sleeper ID: 7076

JaMycal Hasty (MFL ID 14821) missing Sleeper ID: 6996

Patrick Taylor (MFL ID 14817) missing Sleeper ID: 6963

Thaddeus Moss (MFL ID 14869) missing Sleeper ID: 6919

I have an app that's using the awesome .csv to help analyse some rosters. Depending on how you create the csv, maybe I could contribute these in a more automated way - or maybe not! Have a chat on Twitter DMs if you want?

tanho63 commented 4 years ago

Thanks! It's currently constructed by doing a few joins - MFL and Sleeper share a fantasydata ID and stats_global_ID, so I merge those two IDs sequentially, and have toyed with name-merges for others. This improves as MFL and Sleeper prune their IDs. I could maintain a supplemental ID csv/database that's joined in after those joins 🤔

Twitter DMs is good by me too, although I can better pretend I'm working when I have GitHub open ;)

tanho63 commented 4 years ago

I think from a "contributing" standpoint, could make it easier by maintaining a csv on this git, and then accepting PRs to update it. I can download the csv onto the server as part of the script and then do the supplemental join to fill in missing. Not sure that'll fix Herndon, but may be a separate issue. DeAndre Hopkins of all people had a similar issue last year (sigh)

danabrey commented 4 years ago

Chatting here it is, then. I've given up pretending to be working since moving to full remote working due to COVID, haha.

Having a CSV on here that I can make PRs to sounds like a good solution. For my app's purposes, I'll need to maintain a list of missing IDs somewhere anyway, so if you let me know what columns that CSV will have, etc, then I can start putting that together.

tanho63 commented 4 years ago

All the same cols and nomenclature as db_playerids is probably fine for now (just can't be missing mfl_id). I'll need to figure out "new_gsis_ids" at some point because the NFL scrapr/fastr API changed, but I may be joining those in as a separate csv/table.

MFL ids are serving as my primary key right now (would consider changing that laterish but I find it's the best/most-complete/consistent API so am a little anchored), so the only restriction on the csv is that mfl_id must be there and unique (any other fields can be missing).

Thanks for being interested in this, really appreciate it :D I'm masked and working from the office rn haha

danabrey commented 4 years ago

Awesome. I think for cleanliness for now, a simple two column CSV, mfl_id and sleeper_id, would suffice? I can put that in a fork of this repo and submit a PR, any care for the name? missing_sleeper_ids.csv?

And my interest is almost entirely self-serving! This data resource is so awesome as a base for so many of my side-projects, I have probably 10 different crazy spreadsheets all using combinations of values, ID merging, etc. and now this larger web app project.

danabrey commented 4 years ago

Also, agree that MFL provides the most consistent player list. My data-mining always goes Step 1: import players from MFL. Step 2: merge everything else in. Any player that's missing from MFL's database isn't a real player :)

tanho63 commented 4 years ago

I've noticed a bunch of PFR mismatches too (that's joined with a name/team/pos merge) so if you just call it "missing_playerids.csv" and leave me a blank col there that'd be helpful :)

trojanguard25 commented 4 years ago

@tanho63 - have you considered creating your own internal id for the db_playerids.csv to use as the primary id for this table? In the baseball world, I've used this project https://github.com/chadwickbureau/register to cross-reference player ids at different websites. They created a new id for each player that is guaranteed to be consistent and unique, and it doesn't rely on 3rd-party ids. Probably overkill for this specific issue.

tanho63 commented 4 years ago

@trojanguard25
is the main reason why I haven't, to be honest. I'd consider it more if I thought there wasn't at least one definitively good/maintained one like MFL or fantasydata or whatever

tanho63 commented 4 years ago

I'll close off this issue when I get the merge script sorted and it works for the first time :)

danabrey commented 4 years ago

I opened a fresh PR with every missing Sleeper ID bar one - #12

tanho63 commented 4 years ago

Decided to run it early while I was still looking at it, looks good to me :)

tanho63 commented 4 years ago

Thanks again for reaching out and contributing, much appreciated!

danabrey commented 4 years ago

That's awesome news, thanks!