TabbycatDebate / tabbycat

Debating tournament tabulation software for British Parliamentary and a variety of two-team parliamentary formats
https://tabbycat.readthedocs.io/
GNU Affero General Public License v3.0
247 stars 840 forks source link

Create view for adjudicator scoring statistics #881

Open tienne-B opened 6 years ago

tienne-B commented 6 years ago

As an extension to feedback scores, there should be a way to examine the adjudicators' voting pattern. This would help with determining whether there is confusion on scoring or other considerations on how the scores are assigned by these adjudicators. For ease of use, graphics should be used to represent the judges' scorings. Specifically, box plots could be effectively used in this context to show the range of each judge and other range-statistics. This could show whether a judge is on the low-side or another abnormality.

czlee commented 6 years ago

For what it's worth, I think this is a good idea, but I'd file it under "needs that are best served by creating a good API" (like a REST API), because I feel like different CA teams will want to interpret this in different ways and the better tool to have is a way to pull and interpret data yourself 🙂

tienne-B commented 6 years ago

The issue really is that there is no easy way to pull all an adjudicator's scores. While having some access to that "view" would be useful in an API, I think visualizations of this data would be appropriate within the web interface (like for demographics/motions). Box plots are not interpretive of the data, but allows the CA teams to have a sense of the data.

czlee commented 6 years ago

I agree that it'd be useful, I just think that there are many useful visualizations, each of which would be useful only to some CA teams, and counterproductive to others. I therefore caution against diving too deep into this space; the better solution is to make export for analysis in external tools (or third-party systems, but that's probably wishful thinking) easy. I've never had this particular request from a CA team, and for BP tournaments I'm involved in this doesn't seem like a super useful way to look at it—you would expect scoring ranges to differ according to which teams they're seeing (judges aren't always rotated up and down the tab), so you can't infer much about adjudicators' accuracy from the fact that they have different ranges. That doesn't mean other tournaments won't find it useful, and I can certainly imagine why particular tournaments would like it, but many more would not—much like every other way to summarize feedback data.

Showing that a visualization would be helpful in some circumstances is the easy part; the hard part is showing that this should be one of the visualizations that we include as standard in Tabbycat.

philipbelesky commented 6 years ago

I largely agree with the above. To my mind the motion and diversity stats are a little different in that they are foremost a public-facing affordance rather than a tool for the adj core.

We will include Django rest when the allocations branch is merged which will make producing these kind of end points much easier.

tienne-B commented 6 years ago

How about going away from visualizations, but having the average (between each speaker and debate) difference between SpeakerScoreByAdj and SpeakerScore? That would take into account the strength of the judged participants and could show the general adjudicators' skew.

czlee commented 6 years ago

For the use case I have in mind it'd be more useful, but CA teams could still reasonably differ on it, so my meta-level hesitations are still the same 🙂