csthunes / fantasy-football-project

Project space for a web app enabling the user to play around with NFL football statistics to predict the top fantasy players of the next year.
0 stars 0 forks source link

New solution for point tiers #1

Closed csthunes closed 1 year ago

csthunes commented 1 year ago

Improve on solution for point tiers for determining the quality of games. In particular, improve discrepancies between positions and where each quality level would fall for that position.

The goal here is to make sure our score variables are well balanced between positions, so we can accurately compare players of different positions to each other.

csthunes commented 1 year ago

Option: Create point tiers dynamically based on the data from the top 20 players for each position. Rather than set a point tier range and try and balance the numbers in each range, have it dynamically set by normal distribution percentiles of points for the given year, position, and point type.

For example, when looking at calculating the PPR tiers for RBs in 2022, we should follow the following steps:

  1. Before grouping and applying aggregation functions, sort the RBs in 2022 data by PPR points. Take the top 20 (or 25 or 30) * # of games for that season (17) = 340 rows and calculate the standard deviation of PPR for the group. Let's say this value is 4.5.
  2. Use 20th, 40th, 60th, and 80th percentiles of points and their corresponding z-scores to decide ranges. The z-scores for those percentiles are -.84162, -.25335, .25335, .84162. Multiply the z-scores by our standard deviation to get our values -3.78, -1.14, 1.14, 3.78.
  3. Calculate the mean PPR points from the top 20 RBs. Add to our PPR mean the values from the previous part to form the tiers and pass it along to the aggregation function which will use them.
csthunes commented 1 year ago

Selecting the top however many rows of the dataset results in a very positively skewed distribution, which means it doesn't make sense to assume normality when calculating where percentiles are. This solution is better than the older way, but it needs to be tuned to be more acceptable to the skewed data.

csthunes commented 1 year ago

Using median instead of mean and taking top 40 rbs and top 60 wrs instead of standard 20 like other positions alleviates much of this issue.

Closing this...good enough for now