"Number of Runs" vs "Courses Run" in Team Members list

SteveDesmond-ca commented 3 years ago

See https://challenge.fingerlakesrunners.org/Team/Members/3 vs https://challenge.fingerlakesrunners.org/Overall/AgeGrade

@adamengst Can you confirm that we want to do the same thing on the team list as is done for the overall age grade competition, so a team member who has run all 10 courses with an 70% average should be ranked higher than someone who ran 3 courses with an 80% average?

adamengst commented 3 years ago

Hmm, I'm not sure we do want that on the team list, since the number of runs there is overall, not courses run. And it might be useful for the team to see who their fastest people are for future directions (Rich, go run X!) rather than having to figure that out. Was there a particular reason you thought of changing this? Did someone ask?

SteveDesmond-ca commented 3 years ago

I just noticed that someone who had only run once was #2 on the 30-39 list.

I think what it comes down to is "how are the awards for top 10 fastest team members going to be determined?" Our rankings should match that.

e.g. if our #2 doesn't run any more for the rest of the year (admittedly unlikely), but still has one of the best age grade averages based on their one fast run, is that considered "top 10"?

SteveDesmond-ca commented 3 years ago

Here's what the 30-39 list would look like as proposed:

And then you could click the "Age Grade" column to see who's fast but hasn't run certain courses yet:

adamengst commented 3 years ago

I think what it comes down to is "how are the awards for top 10 fastest team members going to be determined?" Our rankings should match that.

You raise a good point, and one that I don't think has come up so far—I apparently just sort of waved my hands at "top 10 runners on the team" in the prize listing, and it's not in the rules. Since the team as a whole is competing across all 10 courses, the "value" of a team member should be based on three metrics, I suggest:

Average age grade, as we have it. No questions here.
Number of courses run, as you suggest. This is important because if someone doesn't run many courses, they're not contributing either efforts or age-grading to the team score.
Number of runs, as it displays now. This is important because if someone runs a course only once, that's not as valuable to the team as someone who runs it multiple times.

So one solution would be simply to add the Courses Run column as you suggest but keep the Number of Runs column. By default, sort by Courses Run, then Age Grade, then Number of Runs, perhaps.

I can't quite work out in my head if there's a single number that could be derived from those three metrics that would show the value of a team member—it might need to be evaluated in person. For instance, if two runners run all 10 courses, and one has a higher age-grading but the other has more efforts, who is more valuable?

Thoughts?

SteveDesmond-ca commented 3 years ago

Maybe since age grade and num runs count equally for overall team points, the "top 10" should take the top 5 from each? My guess is the overlap will be significant, but I'll see if I can come up with a more formal definition of the code helps clarify anything.

adamengst commented 3 years ago

Good thinking. Or, what about taking the top 10 from each and dedupe to eliminate overlap. Perhaps there's a way of determining whether a particular age grade or a particular number of runs is somehow more valuable for ranking the remaining non-overlap runners.

SteveDesmond-ca commented 3 years ago

The hard part of any calculation or algorithm is implicit bias in one direction or the other around the wide variety of distances for the courses, e.g. the ~6 minutes it takes to run EHRW "counts the same" as the ~90 minutes for Skunk Cabbage, but at the same time, for the individual courses, they are equally important on their own, because each course has its own "team points" competition.

We could simplify the algorithm for transparency:

take the person with the fastest age grade for the winning team, and remove them from the "most runs" list
take the person with the most runs, remove them from the "fastest" list
take the next fastest, remove...
repeat until you have 10 people

This essentially works out to "top 5 from each with no double-dipping", and set theory says it doesn't matter which of the 2 you start with, but still has the bias that 11 EHRW miles counts more than 10 Skunk Cabbage.

I haven't worked out yet if this is fair in all cases or not, but we could also have a "team member score" which is equal to: 1 + log((total runs) x (total miles) x (average age grade))

Theoretically that accounts for all the potential edge cases, but am not sure how it will work out with real-world data of comparing:

someone fast that doesn't run a lot vs someone who runs a lot but slower
someone who runs more of the longer courses vs someone w/the shorter ones

Maybe we start with your previous suggestion, keep the rules a little vague, and re-evaluate in the summer once we have more data?

adamengst commented 3 years ago

I think waiting until we have more data is the way to go. Realistically, people won't care about how the results are arrived at as long as they look right. So once we have data and can see how different algorithms change the output, we'll be able to say "Whoa, that approach has a really non-intuitive result." or "Yeah, that seems right."

SteveDesmond-ca commented 3 years ago

The top-level fix for this was pushed a while back in d0087c5, and as the year end approaches we can open new issues to tweak the edge cases

FingerLakesRunnersClub / Leaderboards

"Number of Runs" vs "Courses Run" in Team Members list #54