2017 Rating - Githubissues

kyprizel commented 7 years ago

Hi, now, when people say that voting-based rating schema does not work - let's discuss how can we fix it.

Should we use ELO-like system? Something else?

Pharisaeus commented 7 years ago

I'm sad my proposal was not included in the summary :( Especially that is significantly simpler and has zero "gaming the system" capability, unlike all the rest here.

akrasuski1 commented 7 years ago

This would make early CTFs receive more points as spammers would still have their votes count as 1 instead of 1/nKudos. But I felt it would not be very significant and stabilize quickly

Just a quick comment on this - instead of giving 12kudos/year, you could have kudos "regenerating" every month or every N CTFs (perhaps capping their number at a reasonable value, like 12). You might also consider giving newly registered teams no kudos at start, so one cannot register multiple accounts just to upvote single CTF (unless they plot it for months...)

Such system would have the effect of stopping point inflation, since it effectively normalizes team vote (sum of team votes over 1 year = 12). Note that this would work with continuous votes too (giving, say, 50pts for each team every month to split for any CTFs they believe are worth it - again, with some cap, probably).

MathisHammel commented 7 years ago

@Pharisaeus I'm so sorry, I did not take the time to scroll all the way to the top... All of you who did ot get their ideas included in the draft, please push them here !

@akrasuski1 That's a very good idea, thanks for the contribution !

pedroysb commented 7 years ago

What is the problem with the idea of Pharisaeus? I really liked it.

1- It is simple and doesnt need interaction between players and organizers. They just need to play the CTFs and that is it. 2- No "gaming". Voting or kudos are subjective. 3- In my point of view, the number of top teams is what defines the difficulty of a CTF. So, it should define the CTF scoring. Even CTFs with few top teams and low scoring will be good incentives for playing, because no top teams means to be easier. 4- The number of top teams is a good incentive for CTF organizers to make their event better.

immerse commented 7 years ago

@Pharisaeus

I'm sad my proposal was not included in the summary

The way I see it, @kara71 was only summarizing one suggestion -- that by him and @pedromigueladao. But that's just one suggestion.

My problem with it is that it seems like an only-small change from an already broken system. Plus it's complicated. And I don't have faith in people voting, nor voting fairly.

I think that Pharisaeus's approach makes a ton more sense. But fresh ideas would also be nice.

MathisHammel commented 7 years ago

As a reminder, @Pharisaeus 's idea counting the number of top teams participating to an event.

The advantages to that are that nobody can game the system and the score is determined and frozen as soon as the CTF ends. However, a major downside (which has recently been illustrated by the duo AlexCTF/BITSCTF) is when a pretty average CTF starts just before a "better" one. This results in most teams focusing on the one which started first. Also, top teams tend to participate to every single CTF, which would make that rating very even like we had last year.

immerse commented 7 years ago

a major downside when a pretty average CTF starts just before a "better" one. This results in most teams focusing on the one which started first

What? Why would top teams stick to a less-fun CTF just because it started first? To me that's nonsense. I think you are misinterpreting something.

top teams tend to participate to every single CTF

I'd claim exactly the opposite -- the skilled teams don't play every single CTF, because many low-tier CTFs aren't challenging/interesting/fun.

v0s commented 7 years ago

@pedroysb @Pharisaeus In "CTF weight = number of top teams" scheme, how to take into account that there are lots of just-for-fun participations by top teams? See 2016 for PPP and LC↯BC

MathisHammel commented 7 years ago

Also, about the "subjective" voting : it seems essential to me that CTFs are rated by humans, not (only) by shady heuristics. There will always be ways to cheat if we include team ratings, the goal is to limit that as much as possible.

Another idea to keep teams from creating multiple accounts to illegitimately boost CTFs, in addition to slowly crediting all teams like 1 every 2 months, would be to give kudos according to the team's individual placement (e.g. top 10% grants 0.5 kudos, top 1% is +1 kudos). Of course, duplicate teams would still have somehow duplicate voting power (however I doubt anyone would go so far to get a few more points). At a point, we can not distinguish between low performing teams (which should have voting power) vs. duplicate team.

MathisHammel commented 7 years ago

@immerse Apparently our visions about the number of top team participations differ. I might definitely be wrong, so I'll probably pull a few stats to see what's really happening

pedromigueladao commented 7 years ago

@Pharisaeus i believe @kara71 was not throwing your suggestion away by any means. He was trying to summarize and put together a tier-like rating system and how it would make sense. I believe your suggestion is orthogonal to this one. Leaving as it is but adapting the way each CTF-rating is computed.

Anyway, it is weird to me to compute scores a posteriori (there might be some exceptions of course). Maybe I am biased as we are an academic team but putting a team together every single weekend just because the CTF might be worth something... and that depending on the fact that the good teams will also play...

dcua commented 7 years ago

The problem seems start from the question itself -- the voting system may actually worked fine, and exactly as was expected. Last years every single event quality and concrete playing team result was carefully analysed and commented (c.f. https://www.reddit.com/r/securityCTF/comments/5l5wig/ctftime_voting_system_is_broken/dbtxy5k/ alone). We all eventually discovered that teams may have different opinions. Although there is temptation to force own decision to others via PR manipulation or changing rating weight algorithm, it should be avoided. Was not that a main idea of changes after all -- to make scoreboard more transparent and independent from conflicting interests of active playing teams (including CTFtime orgs). In first part it worked very good -- CTFtime become more transparent. In second one it worked as prototype, and may be improved future.

Another problem is in approach -- we seems trying to make CTFtime scoreboard a measure of skills of infosec professionals. Not CTFtime, nor any other existing CTF scoreboard is a valid metric for these, good coverage was never achieved and seems never be. Compare for example how many binary exploits you have been developed during last 3 pentests, and how often your usual task of development of AV/BDS-bypassing dropper for phishing attacks was on CTFs of last 3 years. CTFs however are very good as educational tool, and seems the only effective legal way to learn offensive security. That defines team content and structure. In my opinion the main goal of CTF is education, the more students and beginners are in team -- the better. The scoreboard of CTFtime is mostly measure of learning performance (how team progress between years), and quality of team management (are they capable to make team play). There are better ways to show real-world skills -- research, bug bounties, teaching... Do not use CTFtime for job it doesn't fit, or try to fix unfixable -- this will save alot of time for everyone.

The third problem is l33t-isation of CTF teams. We seems trying to find a legal way of being "top ctf team" without playing CTFs. That is not possible, if you called yourself leet/placed yourself on top of your own scoreboard -- prepare to prove it on every single CTF of the world, "not playing" is not an option. I think we should never discriminate teams on any parameter above absolute minimum needed to prove that team is legal. E.g. if some team played 5 ctfs -- it is valid, legal, and have exactly same rights as dcua, Dragon Sector, LCBC or p4. "special" "top teams" with privileges to vote/talk/get special rating should never exist. The only important thing in CTF is CTF itself, it actually doesn't matter what any "top team" think about it -- estimation of "rating weight" by any active team is invalid because of conflict of interests. Voting allowed to smooth that process, but history of world democracy shows that there is no ideal solution possible. I think we all should relax and play CTFs instead of building some weird formulas, validity of which can not be proved.

In summary I propose:

Leave rating weight voting system, and fix the problems discovered:
- Bias to higher votes -- downvoters have advantage, event can loose 100% weight but gain only 50% compared to last year. People are voting max in assumption that there will be jerks in poll, who downvote regardless of quality. Make voting boundaries as previous year rating +/- 50%.
- Team name squatting -- allow to vote for teams who registered on CTFtime and played several CTFs (5? 10?) after registration. That will fix spam like was on 33c3 CTF.
- Fake CTFs -- make decision group, who will remove event in case of bad quality. No matter how group will be choosed (via some election, or just several people from current CTFtime orgs), people should be well respected and really analyse event tasks/make decision on first demand.
Deploy alternative rating in separate sandbox Changing rating weight formula every year delegitimise all previous CTFtime results, and should be made only after testing. In case we now find any new approach(es) for rating formula, it should be deployed in alternative resource (ng.ctftime.org? beta.ctftime.org?) and tested on live results during the year, by all active team (not only 10 people in current discussion, 4 of which are p4 team members).

MathisHammel commented 7 years ago

As I see on the reddit thread posted by @dcua "why 2016 CTFtime scoring is broken", it appears that we (especially I) completely missed the point of this thread.

What I saw there is there are two groups:

One group believes the old ranking (admins determining the event weight) was too obscure and could be manipulated by admins (which are also competitive players)
The others felt like the new (2016) public voting was too easy to cheat (there was one CTF finals where dcua came last and one of their members voted weight 1 which lowered a lot the event's weight).

I definitely understand both points of view, although the dcua low rating is pretty much the only "cheat" example that is used in the thread. Also, performing bad can lower the perception of an event's quality (or a bad quality leads to not give all your efforts).

So I feel all the great ideas everyone has had should be put aside (but not forgotten !) while we try to find a solution that fits everyone

Pharisaeus commented 7 years ago

@dcua

not only 10 people in current discussion, 4 of which are p4 team members

4/15 right now, and as you can see we actually don't really agree even among our team ;)

to make scoreboard more transparent and independent from conflicting interests of active playing teams privileges to vote/talk/get special rating should never exist.

This was exactly the idea behind my proposal. There would be no active party at all, no voting, no setting arbitrary weight by one single person. A totally distributed and unbiased approach. But as you were making such goof points, you end up with trying to keep the voting in place, which contradicts what you just said.

history of world democracy shows that there is no ideal solution possible

In democracy two junkies have more voting power than a professor, and the same goes for CTF voting sadly. It's really visible for example here: https://ctftime.org/event/424/weight/ where a lot of noname teams (and dcua, who won so upvoting is understandable I guess) are voting 25p for a 4h bad quality CTF, while InsomniHack Teaser got less than 30p, being a 36h high quality event. I can't see logic in that at all.

Deploy alternative rating in separate sandbox

I actually like this idea. It would be interesting to put in place more than one global scoreboard, however I'm afraid it might get too complex, especially with those kudos/tier voting etc. My proposal, however, could probably be added with no special overhead since it requires no active participation from anyone.

Changing rating weight formula every year delegitimise all previous CTFtime results

I disagree because the rules were known so teams played accordingly to them.

@pedromigueladao

Maybe I am biased as we are an academic team but putting a team together every single weekend just because the CTF might be worth something

But it's like this already anyway. And always have been because the event rating could be changed, even when they were "fixed" by an admin. This is why I suggested that even should have a "target weight" set by organizers/admins so teams know what they can expect.

@v0s

In "CTF weight = number of top teams" scheme, how to take into account that there are lots of just-for-fun participations by top teams?

As I pointed out at some point - we would have to make some metric to see whether a team has actually played or only started and then decided it's not worth it. Other than that I can't see why playing for fun should not count. If someone from PPP or LCBC decided that certain event is worth spending time on it, then probably it's good but they just didn't have people to play for other reasons.

@kara71

Also, top teams tend to participate to every single CTF, which would make that rating very even like we had last year.

Not really if you look closely at the ranking. At least depending on how many top teams we consider. The difference between number of events played by for example dcua (65), DragonSector (45), LCBC (37) or PPP (31) it clearly visible. And if you check which CTFs those teams "skipped" -> https://ctftime.org/team/762/vs/284 you can see there (apart from Finals, which is a different matter) all those School, Qiwi, BreakIn, Sunshine etc, which happen to be the easy or poor CTFs. Coincidence? I doubt it ;)

while we try to find a solution that fits everyone

Such solution will never exist ;)

pedromigueladao commented 7 years ago

@Pharisaeus

Maybe I am biased as we are an academic team but putting a team together every single weekend just because the CTF might be worth something

But it's like this already anyway. And always have been because the event rating could be changed, even when they were "fixed" by an admin. This is why I suggested that even should have a "target weight" set by organizers/admins so teams know what they can expect.

IIRC the changes were minimal when scores were set by organizers. Not perfect, but IMO much better than now.

I guess I made my point in this discussion. I would like to see a ranking system that

one knows what one is playing for. IIRC at some point in the past someone argued that events registered@CTFTime with less than 2 weeks notice should be rated as 0 (a priori defined rating)
a ranking that you could not brute-force and be ranked higher than those that played well (capped number of events, or capped-decaying number of events)

Now, how we come up with the rating for each individual CTF is orthogonal to these two goals. @Pharisaeus proposal could be one solution for that. This ATP-like was just a proposal (it has been working for tennis)

And @dcua, you are right. The goal is to learn and have fun. But being ranked high in that list is a nice reward for all the sleepless nights one has for playing CTF.

immerse commented 7 years ago

@v0s

how to take into account that there are lots of just-for-fun participations by top teams?

This is indeed a problem, but probably a minor one. As @Pharisaeus argued, if teams are playing it for fun, it was likely good. Another argument is that only a few teams do this for any given CTF, so it will hopefully have little effect.

@dcua

we seems trying to make CTFtime scoreboard a measure of skills of infosec professionals.

No, we are trying to measure skill at playing CTFs. You seem to be confusing this with pentesting skills.

The only important thing in CTF is CTF itself we all should relax and play CTFs

Despite those comments, you do seem to care a lot about the system remaining as it is...

dcua commented 7 years ago

@immerse

No, we are trying to measure skill at playing CTFs.

I thought I lowered CTF value to bare bottom, but no -- you found even better way, scoreboards measure "skills at playing CTFs". Without infosec background thats a useless thing, like skills in building houses of cards. I see very pessimistic future of my students, if the only thing they got from CTFs is "skills at playing CTFs"...

Despite those comments, you do seem to care a lot about the system remaining as it is...

Ofcourse I do care, we got CTFtime #1 in 2016, working on getting same in 2017 -- it is very important for us that CTFtime remain credible source of world-class CTF rating. Some suggestiongs in this topic are easily lead to full loss of credibility.

For example, someone suggested to measure CTF quality based on number of "top teams" played -- isn't that obvious, that currently there is no way to confirm that "top team" really played particular CTF? Any team, who sees that it will win some school-level CTF, will register "PPP", "Dragon Sector", "dcua" near end of CTF -- and voila, we get defcon-level rating. There are no technical way to confirm that team registered under "PPP" name is really PPP under current system -- scoreboards are passed unchecked, rating added only by match of team name. We will have scandal at the end of year, with real proofs -- I doubt that anyone will take CTFtime seriously after it. Our #1 place if we get it will worth nothing, so yes we do care...

Another example -- 10 "special" "top teams" with right to vote -- as you can see, there is developed community in polish CTF scene, with many good world class teams like DS, p4, DIcsHrs, StR, CS16, and other. Imagine several passed to "special top" in same time -- no one will really trust that there was no collaboration between them, and take seriously anything they vote on CTFtime.

Imagine @v0s will get his ELO rating implemented, and LCBC get TOP1 by playing and winning 2 ctfs per year while skipping all other -- are you sure it will not be considered cheating and abuse of status of CTFtime maintainer?

CTFtime is a good resource, with its reputation as main value. Try to not make it laughingstock with your "improvements".

caioluders commented 7 years ago

The 2016 Voting system is easily fixed by forcing every team to vote , if you don't vote you don't gain points on that CTF . The possible manipulation would be far less strong with a higher number of votes .

nSinus-R commented 7 years ago

Hi all.

@dcua

I thought I lowered CTF value to bare bottom, but no -- you found even better way, scoreboards measure "skills at playing CTFs".

In essence, skill at playing ctf is the only thing which should be measured by a ctf scoring system. Same as skill at playing chess is measured by chess rating systems. You want to use the rating to relate yourself to other persons/team in the field, to see whether you perform better or worse.

Without infosec background thats a useless thing, like skills in building houses of cards. we seems trying to make CTFtime scoreboard a measure of skills of infosec professionals

I strongly disagree. In the first place, ctf is an (educative) hobby in my opinion, although ALOT of people confuse it with a training method for "infosec professionals" - I couldn't disagree more. But as you pointed out, the problems encountered in ctf and security audits are vastly different, and just because someone is good at one of those, doesn't mean he/she is good at the other one.

I see very pessimistic future of my students, if the only thing they got from CTFs is "skills at playing CTFs"...

Hobbies, especially in competative environments, are tending to build up secondary skills. Even about a professional house of card builder, you could say that he is likely to have steady hands, patience and endurance, which might make him suitable to perform in some environments better than others.

Besides that, the FIRST and FOREMOST thing your students should gain from ctf is "fun times".

Anyway, enough meta-discussion for now: I also want to throw in an actual other idea for the voting system: Representative voting. In the begin of each season, the scene could elect a jury of players, which stays over the year in contact and assigns together a weight to every ctf. Exact parameters like how many jury members, how many are allowed to be from one team and so one would have to be established. This incorperates in my opinion a combination from the old system (rating is just assigned) and the new one (voting) and even reassembles 'real' democratic system more close than direct voting.

The advantage is, that in the first place only 'trusted' persons of the ctf-scene become the right to vote. As we have seen, scoring high positions in the ranking does not necessarily imply the perception of trustworthiness in certain cases. If every single 'vote' of the jury members is made transparent, this could on the long run filter out dishonest jury members. Additionally, as for one season always the same persons are voting, the actual weight for ctfing could become consistent, as long the jury itself stays consistent. Consistency here is more likely than the one achieved by a direct voting system in my opinion.

The disadvantage, on the other hand, is that there would have to be enough interested persons to form such a jury, and enough motivated ctf-players to vote the jury together. But maybe some polls on ctftime.org could give some insights about the amount of motivated people? The other disadvantage would be that we would have to create an election-system just for ctfing, which could be perceived as 'overcomplicated'.

//edit: Also, in general, if voting continues on, the voting system should have more diversification, i.e. it should be possible to vote on different aspects such as "variety of challenges", "difficulty of challenges", "responseviness of the ops" and "technical realization". At the moment, one has to express all this different aspects in a single vote and might not even realize that all of this aspects matter.

salls commented 7 years ago

Hey guys, Salls from Shellphish here.

I think a possible solution is to only assign points for top 10-15 CTF's and do so ahead of time. Also you could just assign points equally in this case, but that doesn't matter as much. Then teams would only score CTFTime points for playing these ones.

In my opinion this solves a couple problems:

First problem: Needing to play a ton of CTF's to stay high ranked. I don't think teams should be expected to play a ton of CTF's to prove they are a good team. Second problem: Elo rating where teams are hurt for not playing fully. I don't want to have to register as a different team name when there aren't many members playing. Third problem: Guessing at how many points a ctf should be worth. This should be known ahead of time,

I think this is a good solution because it sets a reasonable number of CTF's where teams can earn points. Teams aren't able to get high ranked by playing a large number of low quality ones. Teams don't have to worry about not playing full, but will be incentivized to play the top CTF's.

edit: just saw pedromigueladao's suggestion. I think it's a very good idea.

dcua commented 7 years ago

@nSinus-R

In the first place, ctf is an (educative) hobby in my opinion, although ALOT of people confuse it with a training method for so called "infosec professionals" - I couldn't disagree more.

Why CTFs can not be a professional training method? There are many good examples. OSCP PWK Lab is essentially CTF, although doesnt have scoreboard.

Representative voting. In the begin of each season, the scene could elect a jury of players, which stays over the year in contact and assigns together a weight to every ctf.

IRL representative voting has problems, compare recent USA president elections -- it is not obvious why it has advantage over direct. But your idea is so far the best new proposal in this discussion, and may actually work.

MathisHammel commented 7 years ago

Having an election system could be the best if we have a sufficient number of collaborators and if we're sure every voter has good intentions.

Another of my ideas was to reduce the voting power of teams who consistently vote opposed to the general opinion. The voting power would vary depending on how close the team's vote is to the actual final event weight. I'm not sure this would be the best solution as many events could end up in the 20± ballpark like last year

nSinus-R commented 7 years ago

First of all the ontopic parts: I totally agree with @salls: Knowing the rating weight of a ctf in advance is beneficial. However, this turns into a problem when a ctf which was known to be good turns bad or vice versa. In the majority cases of ctfs though, the quality of the last years event is a good indication for the quality of the upcoming event.

@dcua

IRL representative voting has problems, compare recent USA president elections -- it is not obvious why it has advantage over direct.

For sure it has. However, the main assumption is that the voted representatives are able to spent time to take educated and well considered decisions, while in a direct voting system

Why CTFs can not be a professional training method? There are many good examples. OSCP PWK Lab is essentially CTF, although doesnt have scoreboard.

I haven't tried this, so I can not comment on this particular case. However, the thing is that ctfing and activities carried through in the professional infosec field require two different skillsets. Although these sets are partly overlapping, they are not the same, and the competition in CTF makes it more to a "sport". In my opinion, the "usefulness" for "real" problems shouldn't be tried to be retrofitted here. Likewise, ctfing is supposed to be an open and educative environment. Abusing them as a training method is amplifying the monetization of CTFs, which is a step in the wrong direction in my opinion.

Anyway, we are getting offtopic here, if interested, we can take this discussion offline or in a seperate issue ("what is ctf?" :) )

dcua commented 7 years ago

@nSinus-R

Abusing them as a training method is amplifying the monetization of CTFs, which is a step in the wrong direction in my opinion.

training != monetization, there are alot of free trainings. I was talking about technical part, without any commercial context/background. CTFs, especially listed on CTFtime, must be free.

Pharisaeus commented 7 years ago

@dcua

Without infosec background thats a useless thing, like skills in building houses of cards

I disagree. I'm not working in infosec at all and yet I consider skills gained via CTFs to be useful. And even if they weren't it's still a fun hobby. Some people collect stamps, other reverse binaries.

For example, someone suggested to measure CTF quality based on number of "top teams" played -- isn't that obvious, that currently there is no way to confirm that "top team" really played particular CTF?

If you've read what I have written you would see I proposed to count a team "in" only if they finished somewhere in the top / gained reasonable number of points. This makes sense since if PPP or dcua is really playing, then they will finish close to the top.

Of course I do care, we got CTFtime #1 in 2016, working on getting same in 2017 -- it is very important for us that CTFtime remain credible source of world-class CTF rating.

The whole issue here is "what is this rating measuring". Because right now it's strongly focused on "how many events someone played", and also "how many players a team has" and in many cases a team lower in the rank would win with a team higher in the rank if they were directly competing at a specific event. Even more so if the number of players was limited to some reasonable numbers. This is for example the case for dcua being 1st while PPP, LCBC and Dragon Sector are below, even though all those 3 teams pretty much always win against dcua when playing in a serious event. In fact dcua either doesn't even qualify for a serious events (DEFCON, HITCON, Codegate) or if qualified gets beaten by teams much lower in the standings (TrendMicro Finals, CSAW Finals, DefCamp Finals, C3). So what does this #1 mean exactly? There is also the case of geography - an international team with players all over the world will automatically have an upper hand, even if they are not better than anyone else. dcua is a good example since they not only have a lot of players overall, but also players from many places. Finals open only to Asian countries? No problem, there are asian players. Finals open only to US teams? No problem, there are american players etc. Similar case for OpenToAll I guess, since they can also participate in virtually anything, which is not possible for majority of other teams. So what does the ranking really show? :) I think this is the core of the discussion here - what exactly should the ranking show and what does being in the top mean.

@nSinus-R the problem would be how to pick such jury and how to make them impartial ;)

@salls this would probably discourage some teams from playing on non-rated events, and also discourage people from organizing events at all, and I'm not convinced this is a good thing. There is also a problem with events that are good but new or became good and yet are not-ranked. And what if there is more than 10 or 15 good events? There is already C3, SECCON, HITCON, DEFCON, TUM, CSAW, ASIS, MMA, Plaid, Confidence, InsomniHack, Codegate, 0CTF, BostonKeyParty, Hack.lu and this is already 15, not to mention that 10 out of them have Quals/Teaser + Main event, which would already be 25.

salls commented 7 years ago

@Pharisaeus After some more looking I think the suggestion from @pedromigueladao is the best. It gives people a reason to play lower events. Also, I would have no problem cutting the list you posted down to maybe 7 top events, other ones are still good, but not top. I'm not sure what I'd do about quals/finals style, I would ignore "teasers".

Would it reduce the number of people that play in lower ranked CTF's? Yes, but I think that the goal of a CTFTime rating should be to rank teams, not to encourage playing lower ranked CTF's every week.

My argument is there should not be too many weighted CTF's. I think it's unreasonable to believe we can have a CTF that has rating points every other week and hope to have the teams rated fairly.

dcua commented 7 years ago

@Pharisaeus

I'm not working in infosec at all and yet I consider skills gained via CTFs to be useful.

Are you talking about infosec skills? There are no much other CTF skills worth developing other than these.

This makes sense since if PPP or dcua is really playing, then they will finish close to the top.

Bypass of your solution is trivial as the attack itself -- if cheating team wins, it can submit part of their flags in fake accounts.

dcua being 1st while PPP, LCBC and DragonSector are below, even though all those 3 teams would pretty much always win against dcua when playing in a serious event.

It shows that bunch of students can work hard, compete against world best teams, and win on some events.

@salls

I think that the goal of a CTFTime rating should be to rank teams, not to encourage playing lower ranked CTF's every week.

The goal was popularisation the CTFs, involvement of more people to play, making archive of CTF scoreboards, not making a closed-invite only club of "top teams"...

immerse commented 7 years ago

@Pharisaeus

an international team with players all over the world will automatically have an upper hand

You haven't tried it, have you? It's rather an advantage to be physically present together for most CTFs. Communication and cooperation are huge issues. Not that it's on-topic, of course.

If you've read what I have written you would see I proposed to count a team "in" only if they finished somewhere in the top

I think @dcua have a surprisingly good point here -- what prevents me from registering as several top teams and re-submitting my flags if I'm about to win?

@dcua

Are you talking about infosec skills? There are no much other CTF skills worth developing other than these.

I'm sorry you feel that way. How about problem-solving skills?

dcua commented 7 years ago

@immerse

I'm sorry you feel that way. How about problem-solving skills?

Hmm, whats that?

immerse commented 7 years ago

Here is a new suggestion: what if we had two rankings? One for "participation" and one for "skill".

The "skill" ranking could be based on top 10-15 CTFs as @salls mentioned -- that would be a simple, probably effective system. The "participation" ranking could be what we have now, with votes, or any other system where even small CTFs give points. That way there is some motivation to play even small CTFs, besides the fun and learning experience, but we still get to see who's more skilled.

Pharisaeus commented 7 years ago

@dcua

Are you talking about infosec skills? There are no much other CTF skills worth developing other than these.

Even if I don't do pentesting, such skills are still useful for a software developer because every time I write/see some vulnerable piece of code there is this blinking light in my brain "Oh, I know how to pwn this! So maybe I should actually fix this right now?".

@dcua @immerse

Bypass of your solution is trivial as the attack itself -- if cheating team wins, it can submit part of their flags in fake accounts. I think @dcua have a surprisingly good point here -- what prevents me from registering as several top teams and re-submitting my flags if I'm about to win?

I fail to see the point. Exactly the same thing can be done right now. You just register 100 teams at the end of a CTF you're winning and then upvote it afterwards. Even if real teams try to downvote, it will have no impact on the final score.

@immerse

You haven't tried it, have you? It's rather an advantage to be physically present together for most CTFs. Communication and cooperation are huge issues. Not that it's on-topic, of course.

We never meet in person to play apart from offline finals because our team is highly distributed even though we're all polish. I'm ~1500km away from the closest next player.

immerse commented 7 years ago

I fail to see the point. Exactly the same thing can be done right now.

The point is that if there's no way to fix your idea, then we have two broken systems instead of one. So let's try to fix yours or think of a new approach.

We never meet in person to play apart from offline finals

My bad. What I said has still been my experience, though.

dcua commented 7 years ago

@Pharisaeus

write/see some vulnerable piece of code there is this blinking light in my brain "Oh, I know how to pwn this!

This is pure infosec skills, they are usefull. My point was that CTFs are usefull for infosec education.

pedromigueladao commented 7 years ago

Just collected some data from Qualifiers for DefCon2016. Got the following table (read as: if you scored <1st column> place in EVERY single Defcon16 Qualifiers, you would get <2nd column> points, that corresponds to position <3rd column> of 2016-ranking) (Notice, for those events with on-site finals, I considered instead the Quals, the case of, 0CTF and HitCon)

Position	Score	2016-Place
1	927,2	7
2	661,419	14
3	555,426	18
4	490,935	22
5	431,553	27
6	403,997	28
7	380,672	31
8	358,37	32
9	339,463	34
10	322,707	38
20	224,947	64
30	166,425	81
40	128,787	116
50	102,498	131

immerse commented 7 years ago

@accuser1011 This isn't the right place to have that discussion. Also I think people might listen to you if gave some cold hard proof.

dcua commented 7 years ago

@accuser1011

Yes, please.

pedromigueladao commented 7 years ago

@accuser1011 could I ask you to do it in a different topic/forum? It might be related but definitely is not the topic under discussion here.

accuser1011 commented 7 years ago

@pedromigueladao Sorry, i'm going to delete the posts here and take the discussion to reddit. We cant tolerate cheaters in our scenario.

gynvael commented 7 years ago

Uhm, hi.

This probably was mentioned before, but I'll drop my personal (i.e. speaking as an individual playing CTFs) two cents in as well.

One idea I like is splitting "major" CTFs from everything. In my definition a "major" CTF would be a competition with guaranteed quality and a well established history of being awesome (think Plaid CTF, DEF CON Quals/Finals, CCC, etc). And there shouldn't be too many of them (e.g. 10 per year, maybe 12).

I'm pretty opened about whether: Idea 1) There would be a separate ranking taking into account just major CTFs, and a separate taking into account everything (including majors). Idea 2) There would be a common ranking, but non-major CTFs would be 0-30 weight max, and majors weight 100 (constant).

The reason I like this idea is that while I love playing CTFs, I would prefer to focus on the big ones instead of having to play all of the 50-70 CTFs per year. Quality over quantity.

An interesting side-effect of having a "Major" label is getting another argument when getting sponsorship for these events ("our CTF is a Major, give moar monies for prizes!11one"). While we don't play CTF to get rich (given my knowledge on how CTF teams function, it's fair to say we play CTFs to get poor), bigger prize money makes it easier for teams to pay for travel to offline events.

kyprizel commented 7 years ago

What if we leave all the process same as prev years (voting/weight) but rating will be counted only on X best results of the team? This will prevent teams from playing in every CTF just to prevent points losses.

It's hard to decide if CTF was "Major" or not if you missed it by some reason :(

gynvael commented 7 years ago

Ad "only X best results of the team" That's an interesting idea, I need to think about it.

Ad "hard to decide if CTF was "Major"" Actually I would do it the other way around - i.e. decide in front, based on previous years, and as early as possible at that (e.g. at least a month ahead).

There are a lot of details to solve here, but before going into them I'm wondering if it's something you would consider at all kyprizel?

MathisHammel commented 7 years ago

@gynvael @kyprizel I think one of the proposed ranking methods fits 'Quality over quantity' while still allowing mediocre teams to improve a bit by playing a lot : The best result of the season is counted 100% in the team's global score 2nd best result is 90% 3rd is 81% (0.9*0.9) etc. nth is 0.9^(n-1)

Which means playing a lot still adds some points (and a bad performance can not lower the total score), but having a few very high results means having a better score than a lot of average results. The 0.9 parameter is adjustable to provide the effect we want.

This only solves the problem of quantity vs. quality, but we still have to figure out how to get CTF weights.

kyprizel commented 7 years ago

There are a lot of details to solve here, but before going into them I'm wondering if it's something you would consider at all kyprizel?

There are a lot of pitfails with "Major" CTFs:

CTF organizers opinion - some organizers think their CTFs are good enough, why should they be refused to be "Major"? Now I get letters about "weight", then I'll get letters about "make our CTF Major" :)
I don't like the idea of making all DefCon qualifiers "Major" by default. I like what LegitBS makes but: a) they're not competitive players (same as me at the moment - so I can't set event weights by my own any more) b) CTFs should not be DefCon-centric
I can implement some Event-voting process for last years top teams so they can choose the "majors", but i don't think it'll perform much better than current "democracy" when every team can vote.

Sure, I can consier it too, but I need to be sure it's much better than all the other schemas.

immerse commented 7 years ago

How about using participation of top teams last year to determine which CTFs are "major" this year? That way you don't have to pick anything manually, and it would be somewhat objective.

A problem with the "n best results" approach is that we'd still have to assign a weight to each CTF, and I don't see how to do that fairly. At least I don't think voting will do.

pedromigueladao commented 7 years ago

@kyprizel obviously you'll be criticized no matter what you decide to do 👍

I offered the suggestion major=DefCon quals just because most teams participate in those, but obviously we can decide differently. The idea is to come up AS A COMMUNITY with a list of CTFs that one antecipate will be good CTF experiences for everyone and grade these higher than others. This would make everyone accountable, both participants and organizers. Participants would prepare better for these, and I believe being considered "Major CTF" would be something that organizers would be proud of, and consequently be more accountable and try in all honesty to make their best effort to put up a good game for everyone.

Of course newcomers can also be great CTF experiences, and we should accommodate those also.

I really believe that one can set-up a committee, maybe not including the top-rated teams to avoid a clear conflict of interests, to decide these. Look, Program Committees for Conferences have constantly to decide which papers to accept, and which to reject. Sometimes they get it right, sometimes they get it wrong. Well, that is life.

TheZ3ro commented 7 years ago

Like many people said before, basing the weight only on a voting system will result in a broken weight. Obviously average team will vote higher for easier CTFs were they perform better (and can get more points) and maybe vote less on hard CTFs.

This will lead to a very high vote average in easier CTFs (like AlexCTF2017).

I think that @Pharisaeus idea is good and unbiased but maybe can be tweaked a little:

If more than 75% of "currently top N teams" join a CTF, it will be more likely to be an harder/nicer/better CTF
Look at the final position of those teams, if more then 65% of them ends up less then Nth position, it was a shitty CTF
Implement also a "shitty"/"likely to play" CTF button active for the first 1/10 duration of the CTF available to all CTFTime registered teams.

So you can forecast if the CTF is trusted to be good by the top team, analyze if the forecast was wrong and also have real feedback (that can't be cheated based on the outcome of the CTF)

The 2nd point can be cheated if the majority of "top teams" give-up on a nice CTF, but they will lose points and positions if others "top teams" actively play the CTF.

This is just an idea. Numeric value can be changed and tweaked. Should be nice to see some 2016-data for the first 2 points

tylerni7 commented 7 years ago

[Speaking as an individual, not for any particular team...] First off: thanks @kyprizel (and other folks who help keep CTFtime running). Yes there are some issues, but it's a great resource overall, and I'm thankful it exists! Anyway, against my better judgment I decided to comment...

Personally, I feel like ELO and all that stuff don't make sense. As lots of people have said, not everyone participates in everything (and shouldn't be expected to) so I don't think a system like that is reasonable. I don't think there will be a better system that just takes results of competitions and their results and can aggregate them in a reasonable way.

That means we're stuck with rating CTFs. So first off, right now as far as I know, there isn't really a rating "system". It's more like some people give numbers to things based on how fun it was/how hard it was/how well one's team did/etc. If someone told me FooCTF is a 30 point CTF on CTFtime, I don't really know what that means. If BarCTF is worth 15 points, what does that mean? Is it "half as good" as FooCTF? Is this a linear/logarithmic/exponential/quadratic/whatever scale? What do these points actually mean? So maybe some sort of guideline, or even just a canonical list of example CTFs and what their official ratings are would be useful.

Second, let's talk about how it was back in the beginnings of CTFtime. Basically @kyprizel and others would pick reasonable scores for CTFs and that was that. Some people (like me! :P) complained about this due to some transparency issues, and I also imagine @kyprizel was sick of getting emails saying "you rated this CTF that you didn't play in as a 10, but it should've been 30, make it more points" and having to respond with "yeah sure, whatever", as well as ranking tons of random CTFs that started popping up, etc. Basically, this system worked okay, but people complained, and I imagine it sucked for @kyprizel as much (or more) than it did for everyone else.

Then we went with this voting thing. As far as I can tell, that is almost certainly worse :P I think part of the issue here again is there is no "standard" for ratings, so people just make up numbers. For a lot of these voting things, it ends up being "I did well, 100 points" or "I did poorly, 1 point". Without a "standard" it's hard to say if a CTF is overvalued or undervalued. Realistically though, I think even with a standard, most people are still going to just vote for whatever benefits their team the best.

If we want to stick with voting, there are a few things that seem like options:

Keep things the same. This seems not so great, but whatever
Whitelist/blacklist teams from voting. This might do some good, but it's hard to say. Maybe something weird could happen where the 1st place team can't vote, but the next 10 places can, or something. This way maybe people who are informed enough to decide (because they played enough to get a top spot) can pick, and the team with the most vested interest in the scoring is blocked. This still doesn't seem great.
Force all teams to vote to collect points. This is an interesting idea @caioluders mentioned above. It actually might solve some stuff since the signal to noise ratio might be a bit higher. It doesn't explicitly prevent voting just in your own interests though. I know personally myself/my team have never voted in any of the polls because I'm sure no matter what we picked, someone would say it was just because we did well/poorly, so we didn't bother. I guess if we were forced to we would vote though?
Weight votes based on.. something. We could weight based on how often a team's votes lined up with the final result; how highly that team was ranked; or heck, meta-vote for points for teams to see how much their opinion is weighted!
Decouple voting and immediate scores. If we assume that the CTFtime rating of a CTF in year N should be strongly correlated with its score in year N+1, then we can just make voting only matter for the CTF next year. There are bad cases here (where do new CTFs start, what if organizers change/get lazy/whatever and the quality between two years is very different)
Remove conflicts of interest. If a team votes, prevent them from collecting points from that CTF. This has some advantages, but obviously opens a whole mess of problems if people open fake accounts.
Ranking, not points. Rather than assigning points, just put CTFs on a sorted list, and then use their position on that list to assign points. This requires some weird math, and it's not clear if it solves any problems...

There are a few other non-voting things that can be done:

Top teams decide. The top teams get a special voting ability to decide CTF scores
Auto scoring. Some algorithm decides CTF scores. I guess in theory this would do something like measure the correlation of a CTFs final score to the "true CTF team rankings", and highly correlated CTFs would get higher scores. This definitely has an issue of self-perpetuation going on where the ranking will not favor having things ever change. (Deciding what "true CTF team rankings" are is left as an exercise for the reader)
Dictator. Appoint a benevolent dictator to decide the scoring. For example, have whichever team won CTFtime in year N decide the CTF scores for year N+1 (and remove them from the scoreboard for year N+1). This one actually seems like a pretty decent idea to me, personally.

Okay, apologies for that stream of consciousness spouting off of ideas... not sure if that heps or hurts the discussion :P

dcua commented 7 years ago

Some of these ideas may be combined -- winner of CTF doesn't vote, TOP-11 of analysed CTF scoreboard is forced to vote in following way -- if team doesn't vote it gets no points for this CTF, and vote of next team is taken.

nSinus-R commented 7 years ago

The problem I see with forcing different teams to vote for each ctf is consistency. Currently, with the voting process, we don't collect enough data to have consistent weighting for ctfs over the season with statistical significance. In my opinion this results into a) a lot of jitter and b) weight-point inflation (a lot of persons just vote max-available score).

I really think a committee/jury could be better for deciding the base weight, no matter whether they decide upfront, i.e. at the beginning of the season, or live, during the ongoing season. I mean, in the past @kyprizel was basically a one-man committee and it worked in my opinion better than the current voting system, as weights were assigned under the same evaluation criteria. This is not the case when having a changing voting committee for every ctf (like top10 teams of the ctf), as the the evaluation criteria for 'how good was a ctf' differs vastly from person to person. (This could partwise be resolved by differentiating the aspects of a vote, however, it does not really fix the problem of changing subjectivity.)

However, it was mentioned that committee-based approaches may lack transparency. This could be solved easily: One could require to make the chatlogs of the committee, discussing the weight of a ctf, to be made public. Of course, this all assumes that there are enough people, from different teams, willing to form a committee.

Anyway, I would also opt-in for having some "Quality>Quantity"-heuristings for the scoring system.

TheZ3ro commented 7 years ago

@dcua This will not prevent TOP11 teams to vote based on their score in the current CTF

@tylerni7

We could weight based on how often a team's votes lined up with the final result

And if you register for 2-3 shitty CTF in a year, for this CTF you submit just 1-2 flags and you vote negative (the CTF is really bad), does that mean that I want to lower the weight of the CTF because I've got less points?

Do you force team to vote by not giving them points? Then If a team forgot to vote they will lose points?

ctftime / ctftime.org

2017 Rating #40