BurntSushi / nfldb

A library to manage and update NFL data in a relational database.
The Unlicense
1.07k stars 263 forks source link

Adding custom stats to database #262

Open CMorton737 opened 6 years ago

CMorton737 commented 6 years ago

@ochawkeye will probably be the best guy to help me out with this question

I use nfldb for fantasy analysis. I have my own processes where I compute scores for players, but it can be cumbersome when I have to iterate through every play to calculate 'tiered' statistics, such as distance of each kicking_fgm stat. I'd like to be able to use the .as_aggregate() function, as it is so much faster than calculating it play-by-play, especially when calculating multi-season stats. I figure the best thing to do is to create a new {kicking_fgm_50 : 1} stat and add it to the stats dictionary in the nfldb database.

I see two paths forward to address this: 1.) I can modify nflgame to compute these stats as it parses the raw json and populates nfldb 2.) I can write an independent script (update-tiered-stats.py) that will iterate through every play in the database and insert each tiered stat to its original play statistics set in the database

I like one because it is clean and straightforward, but I don't know how much I like the idea of editing the base code of the nflgame package. Not my usual M.O.

I like two because it is a modular add-on script that can be shared online. However it seems inefficient and a bit of a lazy hack.

So, what are your thoughts on the best way to go about doing this? or maybe you have another idea as to the most efficient way to keep track of FF scoring using the nflx packages?

etsweeney commented 6 years ago

Someone will have a better answer, but in the meantime you can check out what @JessePresnell did with his forked code, as it looks like he added some fantasy scoring.

https://github.com/JessePresnell/nfldb/commits/master

CMorton737 commented 6 years ago

Thanks. This is a bit removed from what I'm trying to do, but interesting nonetheless!

My project is a combination of nfldb and an ESPN FF API. I have one database of player/game information (nfldb), and another database of specific league information for different leagues. Since scoring is specific to a leagues settings, the scoring procedures must be dynamic.

I'll add a bit more information I would ultimately like to have my code operate as such:

league_scoring = {'passing_tds': 4, 'rushing_yds': 0.1,..., kicking_fgm_50: 4}

q.game(season_year = 2017, week = 1).player(player_id = 'xyz')
for pp in q.as_aggregate():
    for stat in pp.fields:
        score[stat] = league_scoring[stat] * getattr(pp, stat)

Where the statistic kicking_fgm_50 was already calculated and inserted into the nfldb database, rather than having to compute kicking_fgm_50 every time I want to compute a score

I'm just looking to discuss the best way to get tiered statistics such as kicking_fgm_50, passing_tds_50, punting_inside_20, etc into the database before querying so that when I compute the score it becomes a simple "stat_score = stat_points * stat_value" operation.

sansbacon commented 6 years ago

You can create a trigger function that does the calculations when new rows are inserted into the applicable table. This will slow down the inserts, though.

You could also create a postgresql function that does the calculations for you based on various parameters. Then query the function instead of the underlying table.

As an aside, why is it a problem to recalculate in python code? I would be surprised if performance was a huge issue for a small number of players and statistics. If you find that you are doing the same calculations over and over, you can cache the function results.

JessePresnell commented 6 years ago

Hey @CMorton737 --

I've updated my query a bit, but I ran into a similar use case where I had multiple scoring systems. The best thing to do is take the structure of the fantasy.sql script and create views that will automatically update named for the scoring system you use. In my case individual_dk represents individual level fantasy stats for Draft Kings. You could do the same for ESPN, Fan Duel, NFL, or whatever scoring system you may use. Creating views seems more optimal than writing additional parsing functionality as the data that's already been parsed is sufficient to build your scoring tables out with. Using views ensures the data stays dynamic.

As for referencing those tables inside of nflgame, I'd have to toy with the code, but I'm sure it's possible.

CMorton737 commented 6 years ago

@sansbacon I like the idea of using a trigger function. Assuming by 'slowing down inserts' you mean during a live game or when populating nfldb, it which case it I don't see that as much of an issue. I'll definitely look into it.

I see it as an issue because my ultimate goal is to use my FF analysis package as a backend for a small fantasy football stats website where a user can enter their league ID and get info relevant to their league/waiver wire. Since NFL player statistics are global, I figure it would be best to calculate the tiered stats one time and insert them to the database, rather than calculate them every time a user requests a score (which could end up being quite frequent)