carykh / PrisonersDilemmaTournament

Watch This Place's awesome video about iterated Prisoner's Dilemma for context! https://www.youtube.com/watch?v=BOvAbjfJ0x0
MIT License
205 stars 160 forks source link

Suggestion: add chance of miscommunication #27

Open Devon7925 opened 3 years ago

Devon7925 commented 3 years ago

Preface

Right now a more lenient version of grimTrigger with titForTat beats all of the example strategies, even when adding several other strategies(tested with both 44 and 24 different strategies). Due to the existence and success of grim strategies, Defecting at all can cause your strategy to be unsuccessful. As such most rounds will just be cooperation which reduces the power of strategy as always cooperate will have nearly the same effect and you can't defect to see if they are always cooperate without the risk of triggering them if they are grim. This further increases the effectiveness of grim strategies, thereby increasing their prevalence.

Why grim is so good

Grim is good because unlike tit for tat it is able to punish random strategies as well as being as effective at dealing with always defect, always cooperate, itself, and tit for tat as tit for tat is.

Solution

To address this problem I suggest adding a small chance for your action to be the opposite of what you wanted so you can pass defections off as miscommunications to increase strategy.

jherndon8 commented 3 years ago

This isn’t a bad idea in hindsight, but entries (including my own) have already been submitted and if there’s not a way to edit attachments on the submission, it’d be unfair to change it at this stage in the game.

hand-burger commented 3 years ago

I can understand where you're coming from with this, and it would decrease the prevalence of grim strategies, but it's just too late for that.

Quadrapod commented 3 years ago

I agree with the others that it's too late to be making changes at this stage, but also something Carykh has mentioned from the start about the iterated prisoners dilemma is that what strategy is best depends largely on the other strategies are around you. Testing strategies do poorly when grimTrigger is prevalent as it punishes any amount of defection maximally. However they may do excellently overall when grimTrigger is one of a hundred or more strategies in the pool. So the question becomes what you think the mix of strategies will be like in the end? Which is how the problem has been presented from the beginning.

l4vr0v commented 3 years ago

TL;DR: I have a hard time believing grim is that good. When it was submitted to the second iterated prisoner's dilemma tournament back in the day, as "Friedman," it placed 52nd for a reason.

What's the grim+tit-for-tat strategy that dominates in your meta? grimTrigger, I suppose, will do really well with random (since DDDDDD will on average net you 3 points per move, the same as being in a cordial C/C loop). But it will miss out on points with any deterministic strategy that defects unprovoked (joss, detective in the example strats) because it will push for D/D loops where C/C loops could have happened.

With joss, the best response to a defection is to just ignore it- it will steal 3 points from you every once in a while, but C/D C/C C/C is still 2 points/move while C/D D/D D/D is just 1 point/move and if you defect and back off (C/D D/C C/D) that's 5/3 points/move.

With the detective, the best response to a defection is to tit-for-tat- take the free D/C on the next move and then go back to C/C.

In general, when you defect against a deterministic strategy- and any good strategy that probes you with an unprovoked defection is probably going to be deterministic- you get defected against in return. This will come in the form of either a D/D (costing you 2 points vs. a C/C) which may possibly loop or worse yet a C/D (costing you 3 points) at some point. You could defect immediately after the unprovoked defection, because an opponent that used an unprovoked defection to probe you will almost certainly back off into cooperation at some point (because they want to know whether you're the grimTrigger) and that'll most likely be the very next turn- and also you don't want to go the route of alwaysCooperate because detective and alwaysDefect will cost you a lot of points. But beyond that, punishment can get inefficient- even if you take a small hit to go from D/D to C/C (i.e., D/D C/D C/C for +4 in 3 moves) that's better than the D/D loop (+1 per move).

grimTrigger is meta-defining but I don't see it dominating. I think you're more or less locked into titForTat as your basic response strategy for dealing with unprovoked defections and that this tournament will be more or less decided by how well our strategies can exit/avoid D/D loops.

You need to be able to:

My money is on the second requirement by far mattering the most in this contest.

Devon7925 commented 3 years ago

What's the grim+tit-for-tat strategy that dominates in your meta?

Even just defecting on first defect and going grim after two defects will beat tit for tat in example strategies. It goes into always cooperate with detective and thats enough to push it over the edge. A smarter detective would search for ftft tho which would trigger the grim(which is why I won't be using that exact strategy).

I also have a hard time believing a significant number of people will run joss - it placed near the bottom in all of my tests and is not a 0 effort submission. Trying to exit loops may also lead to even worse behavior vs random if you aren't using grim like behavior and may also lead to detectives such as better detective taking advantage of you(or trying to leading to a worse overall result).

For those saying their submissions would be bad if miscommunication was added, reseting the submissions wouldn't be a bad idea(and would probably decreace the amount of purely random submissions anyway).

Quadrapod commented 3 years ago

@Devon7925 Wow, I tried added that script to my pool and our strategies are different but pretty much neck and neck, at least with only the base 9. Every time I run it our scripts basically change places. If I hadn't already submitted I'd make some modifications because you found a slightly better way to exploit FTFT than I did.

Devon7925 commented 3 years ago

@Quadrapod if you are talking about the better detective one, that isn't mine. I primarily tune based on a dataset of various strategies either I came up with or others came up with, as well as the base 9, and in that case my actual strategy consistently beats better detective by a wide margin.

carykh commented 3 years ago

Oooh this is a good idea, like a 2% error-rate or something! My professor actually brought that idea up with me when we were discussing this class project, and we decided, "We can try that for Tournament Round 2, but Round 1 should be the simplest version of the tournament". Which means it's definitely on the table if I ever try this again.

And like @l4vr0v , I'm hoping that unforgiving strategies don't win. But it does depend on what everyone else submits. I think another way I could dis-incentivize them in the future is lowering the defect-cooperate score (because +5 is very high and nearly double the C-C score)

nobody5050 commented 3 years ago

I feel like troll strategies which aren’t good overall but are really good at breaking a specific strategy are going to really define what wins

l4vr0v commented 3 years ago

@Devon7925

beat tit for tat in example strategies

This isn't the objective. Those grims beat tit-for-tat in head-to-head, but a low-scoring "win" sets you back compared to a high-scoring "loss." In an average/sensible meta, that grim will end up losing so many points (~400/game, so 2pts/move on average) in its tit-for-tat matches (by D/Ding when it could've C/C'd), while the tit-for-tat variants will just have cordial C/C games with one another, typically averaging 3pts/move. The grim will do well in all its head-to-heads with tit-for-tats but at the cost of being nowhere near the top of the leaderboard at the end.

Also ultimateDetective is fairly brittle and definitely not good enough for the actual contest. I think a good "better detective" or unprovoked defector would be rather different from that ultimate detective. I've got a few detective iterations that drastically improve on it, although those are brittle too.

carykh commented 3 years ago

Hmmm, @nobody5050 , I'm starting to wonder the same thing. (Don't worry, I haven't tested any of the user submissions yet.)

By me putting that "random" strategy source code in the video on-screen, how many people was I encouraging to submit random strategies? Maybe not most people, but perhaps a significant number, especially of those who don't know how to code. I am a bit concerned that if, say, 30% of people just submitted random, then pure GrimTrigger or pure AlwaysDefect will win. (Both of which will probably have multiple submitters.) This didn't happen in Axelrod's tournament (mentioned by @l4vr0v ) because every strategy there was unique, and most were nice.

If it's a problem, a simple solution is to consolidate all strategies that are exactly similar. So, "random", "grimTrigger" and "alwaysDefect" would only take up one spot in the roster, and then they will almost certainly not win. If I do this, though, I'm worried some people will say I've changed the rules to bias one side over the other. Which is a no-no after the competition has started!

(Thinking about this Tournament Round 2, I love the idea of a 1% miscommunication rate, because that would severely hinder GrimTrigger, and TFT-vs-TFT scenarios who'd get retaliation-echos.)

Quadrapod commented 3 years ago

If you consolidate like strategies into single submissions then you fundamentally change the premise of the contest. It's no longer "Guess what strategies other people will use" but "Guess what strategies will have the most unique implementations." Everyone right now is operating on the same knowledge, if someone bases their strategy on the belief that your video will convince a large number of people to submit random as their strategy and base their entire approach to maximally exploiting that group then I feel like that's just the outcome.

carykh commented 3 years ago

Good point, Quadrapod. It's always safest to just stick with the rules I set at the beginning of the competition. And hey, if I don't get the results I thought I was gonna get, 1) there will be some other participant who will be super happy to win, 2) in my analysis video, I can talk about what influences led to the result that eventually happened, and 3) I can always run a better competition in the future!

nobody5050 commented 3 years ago
  1. I can always run a better competition in the future!

I, for one would love to see a version 2 of this contest with a lot of the user suggestions added in, just to see what your fan base can make if it’s less meta related and more “good strategy” related

redtachyon2098 commented 3 years ago

An analysis video would be awesome.