More transparency/group consensus in the rating of events

kyprizel commented 9 years ago

A lot of people say that rating weight is non-transparent at the moment. What if we allow last years top 10 (20?) teams to decide event ratings?

psifertex commented 9 years ago

Yay! Love the idea.

kyprizel commented 9 years ago

Any Ideas how should it look like? Poll?

psifertex commented 9 years ago

I think so. I would have some set time period after the event during which the teams could submit their suggestions for scores. I'd also make it transparent so that everyone could see each team's vote. That might help minimize "sour-grapes" voting where a team does poorly and thus votes extremely low.

On that same note, might be good to throw out the top and bottom outliers to prevent the overall average from being skewed by extreme votes on either side. Less sure about this though, just an idea.

Obviously if a team runs the event, they wouldn't be able to submit to the poll.

kyprizel commented 9 years ago

So, only teams participated in the event can vote? Then we get a situation when some crappy CTF was played by 10 teams and they rated it to 100 :)

psifertex commented 9 years ago

No, I'd let any of the top 10 or 20 or whatever the consensus is have voting privs on it. One reason you let them vote for the next week or two -- so even teams that didn't play can read write ups and check out the problems after the fact even if they didn't play.

Another related topic would be a good write up from you as to the guidelines for how things are scored now. A rubric of sorts. While different teams will always have slightly different perspectives (and indeed, that's part of the point of this discussion), having a better understanding of why you've scored things as you have so far would be good. Is it just difficulty, is it "quality". If we are really going nuts, these could be seperate axis that are then combined somehow into an overall score (not necessarily evenly weighted):

Difficulty: 1-10 Fun: 1-10 Quality: 1-10 Breadth: 1-10

Don't know that those are the right metrics, but whatever the community wants to incentivize is what we should measure. Organizers will make design decisions based on that criteria which is a powerful thing.

psifertex commented 9 years ago

Also, maybe "organizer communication" as another category. Again, just brainstorming. Perfectly happy just having the scores we have now too.

gynvael commented 9 years ago

To play the devil's advocate here:

Teams are always slightly biased. A top-ranked team will want a bigger rank for a CTF they won, and a smaller one for the CTF their direct competitor won. (it's not the only bias out there)
If the voting would take a week or two, it would mean that the results for the given season would come in around half of January next year, which somewhat spoils the fun.

I would propose a similar but different solution: YES, let the top10-20 teams vote on the rating. The voting deadline is 12h after the CTF starts (for 48h CTFs; less if it's a shorter CTF of course). 12h gives you enough time to see if the tasks are well made, well tested and everything is stable. But it doesn't allow you to predict how well will your team do* or who will win - one bias less.

- well, there is still one more problem - if you know only e.g. 3 people of your team will play this CTF, then the natural bias is to downplay the rank. Not sure what to do about it.

I would still be happy if a main admin would set the initial rank based on previous years of the given CTF, and the top10-20 teams would be allowed to change it e.g. N% down to N% up (e.g. N=50 or 75).

kyprizel commented 9 years ago

I think we should somehow formalize the rule. Something like I do now - In the first year CTF can't get rating more than X, or even has X by default and it can be increased regarding on factors like:

How many teams/top teams played
How many tasks were solved
Was the announcement made in time
Who are the organizers (are they in top?)
Went event smoothly?
Were there some interesting tasks? etc

psifertex commented 9 years ago

Should we require the team voting to also be registered for the event? That might also help promote the automatic integration between ctftime and the events.

Midway through the event isn't a bad idea, but I'm not totally sold. I feel like plenty of problems might not have revealed themselves by then, and many events save their best challenges for particular times. I guess this would certainly incentivize people to release their best challenges first which might not be too terrible.

I think people will be a bit more honest in their votes if they are made public. Hopefully people would want to avoid the public shaming of voting poorly because they across poorly when it would be so obvious, but maybe I'm wrong.

I might actually be ok letting /every/ registered ctftime team (who is also registered in the event) vote and scale the power of the vote based on the ctftime placement in some way. Hopefully the effort required and the relatively minimal Impact overall disincentivizes anyone from spamming teams to impact event scores, and again, it would be obvious if someone did it in a meaningful way I would think so hopefully easy to correct.

Also, I agree some sort of public formula is the way to go, but I wouldn't include number of teams that played directly. I might let the number of teams that played increase the size of the possible points to allocate based on the votes. Does that make sense? If twice as many good teams play, but all rate it poorly, it probably shouldn't inherently be a hire ranked event. Between two events that players rate the same, the one with more players /should/ then have a higher score.

zachriggle commented 9 years ago

I think we could do automatic weighting of CTFs based on relative performance. Voting is going to be bombarded.

For example, events in which some number of DEFCON finalists did not even compete in would have a lower point value. This would make DEFCON the highest point-value event, and quals a close runner-up.

Additionally, being a qualifying event could carry weight in and of itself.

If we permit voting, I think it could be per-team or restricted to the top N teams per event, and be strictly relative. For example, instead of "Would you rate this on a scale of 1-10", "Was this better than event X". You end up with a strict heirarchy of "good teams say this is the ordering of events". You can then infer score at the end of the season, or use the generated data to set the score for the next season.

volpino commented 9 years ago

Also I think there should be a clear policy about rating of events with special rules. For example the finals of 0CTF (https://ctftime.org/event/215) are only accessible by chinese teams (by organizer's decision, not because teams for abroad don't want to travel). I think competitions that are not accessible by everyone but have any kind of restrictions should be rated 0.

kyprizel commented 9 years ago

are only accessible by chinese teams

in every country there are such a competitions and most of them have the same weight

RuCTF in Russia
CSAW in US
something in India etc

psifertex commented 9 years ago

Just to clarify, CSAW is a university-only event, but not limited to US teams. They don't cover full costs for teams from outside the U.S., but they are still invited to attend. There have been Canadian teams who have played, for example.

That said, even the restriction on university only is similar in my mind.

That /might/ be a special case where it's worth having a flag for "college/university only" so they could have a seperate scoreboard for those teams.

As for ones restricted by nationality, I agree, they have no business being ranked on the global ctftime.

kyprizel commented 9 years ago

Finals teams are limited to the US and Canada but is open to discussion on a case by case basis

:)

psifertex commented 9 years ago

I'll double check, but I thought that was only referring to covering the full costs of the trip. Could be wrong though. Summoning @coldheat !

ColdHeat commented 9 years ago

We cover travel for teams from US and Canada and the scoreboard listing those who qualify only show US + Canada. Nothing international just yet but we have had discussions on bringing specific international teams before which is why it's case by case. But the ruling this year is nothing international.

ColdHeat commented 9 years ago

Also just my two cents, I think rating relative to performance of the teams would be ideal. Voting would introduce too much gaming of the system.

Say top ten teams have only 33% total points earned; you could probably say that the CTF was hard. Also for restrictions on who can play why not include a popularity metric that denotes how many teams on CTFtime played the CTF?

kyprizel commented 8 years ago

We added voting for event weight. For the first time we'll just use it as an reference but soon weight for the most events will be calculated automatically. Algo suggestions appreciated.

Current ideas:

only votes of top X teams of past event counts
votes of top X last year teams count
all non-0 points users of past event can vote etc

ctftime / ctftime.org

More transparency/group consensus in the rating of events #1