Open RogerDodger opened 8 years ago
I feel like this has been suggested, but how about evaluating someone's behavior to make them 'eligible' for a mask? Perhaps the behavior of both authors and guessers.
So, I feel like some authors get guessed inordinately. Once people guess certain authors more than a certain amount, these authors can get a mask if they avoid detection. For the guessers, some of them are better than others; these are the ones that an author really wants to doge.
I guess what I'm saying is that... is there some way to evaluate behavior (possibly across multiple rounds...?) to get 'guessability' (how much people want to guess a certain author) and 'accuracy' (how likely someone is to make good guesses)? If an author with high enough guessability avoids a guesser with high enough accuracy, they'd get a mask.
This would have the upside of significantly reducing the amount of masks handed out, and also (hopefully) encourage people to guess more accurately.
On the downside, I'm not sure enough people play the author guessing game to give good data for this sort of thing. The mechanics here might be too arcane to be compelling, or there might just be too much noise in the information.
Yeah, the data is very sparse, so it's hard to test whether a given model actually makes sense.
One idea I keep coming back to is to mark the "precision" of a particular guess as being proportional to how many times the guesser guessed that artist. That is, if someone guessed CiG for 5 things, then they weren't being very precise, and so even if one is right it's not particularly meaningful.
This idea is kicked in the butt a bit by people being able to submit multiple entries. Because you don't know exactly how many stories CiG wrote, how many times should you actually guess him?
The way I keep trying to evaluate a model I come up with is to ask how it's affected by random votes. If someone guesses a random artist for each entry, they will on average only get 1 right. In most models I come up with, that 1 lucky guess completely nails the change for the author to get a mask.
Another thing I try to keep in mind is that wrong guesses are basically meaningless, at least as far as the observed data is concerned. Most people make about 6-8 informed guesses, and then either don't bother with the rest or just throw in a mostly random one. So if I'm scoring guessers by their accuracy, it's usually "total number of correct answers" instead of "ratio of correct to incorrect answers". (This is why the best guesser award is currently given simply to whoever gets the most right answers.)
Some consideration also has to be made that correct guesses in the prelim round are harder than correct guesses in the finals, since there are fewer artists to choose from.
So here's my current working model:
The accuracy of a theory is equal to the sum of the precisions of its correct guesses.
The precision of a guess for author A is equal to min(1, e/g)^2, where e is the expected number of stories written by A, and g is the number times the guesser guessed A. The idea is that guessing someone too many times dramatically decreases the precision of each of those votes. It's squared to make the drop off exponential, and I don't want the sum of those votes' precisions to equal to 1.
Now, the question is what value to use for e. Is it equal to s/a, where s is the number of stories and a the number of artists? This would make it about 1.05 for every artist, which doesn't really work when you consider a perfect theory. In a perfect theory, every guess should have a precision of 1.
So e has to equal either the actual number of stories written by A, or the maximum number of stories written by any author in the event. The pathological case of A submitting a disproportionately large number of entries breaks both of these. The former screws up accuracy since guesses against A's stories are much more likely to be right. The latter screws up precision such that you get too many "free" guesses on the other authors' entries.
I am inclined to have e equal ceil(s/a), which will pretty much always equal 2, and which is imperfect on basically every consideration I've mentioned, simply because there aren't any pathological cases that break it horribly.
Assuming we've figured out the precision of each guess and the accuracy of each theory, we can work on figuring out who gets a mask.
Let the weight of a guess is equal to its precision multiplied by its theory's accuracy.
For an entry to receive a mask:
I'm not sure what value to use for L, but I think F should most likely be 1.
I think using precision instead of weight for the found criteria makes more sense, since that criteria is not really concerned with how good the particular guesser is, just that the story was guessed at all with strong precision.
Since I'm not sure what value to use for L, maybe instead the stories are ordered by the looking for criteria, and the best that passes the found criteria wins a mask? That sort of works, and makes sure there's only 1 per event, except then the biggest factor to winning is making lots of people guess you, and the way you do that is just by being well-known (and having someone else write a story that looks like its by you).
I'm at a loss here.
...So I won't claim I totally understand your math. You're likely to get better thoughts on testing algorithms from the writeoff forum on Fimfic, because some very smart people are likely to see it there.
Still, sparsity of data is part of why I thought evaluating someone's past actions might be useful. I do agree it really shouldn't matter, but it was intended to clip out the sort of people who just spam guesses.
Anyways, reading this, I wonder if you might want to consider tweaking your author guessing game a bit. If you made it so it's impossible to guess any one author for more than the largest amount of stories submitted by any author in the round, it might mitigate that pathological use-case somewhat.
So if someone submitted seven stories, any author can be guessed that much but no more; their name would gray out or disappear or something until a guess was changed. That should also maintain anonymity, without revealing too much extra information to the people guessing.
If there's a cutoff where that becomes less useful for calculating (maybe seven's too many to compute accuracy well) perhaps the mask is unwinnable on rounds where one author submits too many stories. So if everyone suddenly starts submitting under Anonymous, this becomes a non-issue anyways. Guessing goes on for Best Guesser, but no mask is awarded.
Or maybe use a hard guessing cap and users who submit too many stories are simply factored out of the running...? Nah, I don't know how that could work, because you'd still need to allow max number of guesses to insure you could fill all the slots, or... eh. That way lies madness, I think.
I dunno if any of that's useful, but it might be worth thinking about. If you're game's not making sense, perhaps you need to change the rules a bit. :P
Re-reading this, I think I got into a pretty wrongheaded approach when thinking about it in terms of accurate statistical modelling, rather than designing it like it's a game.
With that in mind it's much less important that the rules be "correct" as much as they create fun incentives to play the game:
For authors, the mask award is a game to mask their own identity. Problems:
There's not a lot extra authors can do to "earn" this award, other than maybe going out of their way to try a style or genre they're unfamiliar with. Is this something worth encouraging?
For guessers, the sleuth award is a game to be the best guesser. However,
I think the above issues with the sleuth award are fairly easy to solve:
The issues with the mask award remain unclear how to solve, if they are worth solving at all.
("Theory" is the set of one user's guesses.)
The intention of the award is to reward participants who successfully avoided detection.
Previously, it was simply dealt out to any entry that received no correct guesses against it. This was bad for a number of reasons:
In fact, the mask was dealt out to so many entries, that there were more of them given out than ribbons!
With that considered, I'm brainstorming ways to bring the mask back such that it can fulfil its original purpose (encouraging people to mask their identity). Some constraints I am thinking of:
Can these constraints all be met in a non-abusable way?
(If this problem is solved, all old masks will of course be revoked.)