carykh / PrisonersDilemmaTournament

Watch This Place's awesome video about iterated Prisoner's Dilemma for context! https://www.youtube.com/watch?v=BOvAbjfJ0x0
MIT License
205 stars 159 forks source link

After the deadline in a few hours, please feel free to USE THIS THREAD TO SHARE YOUR STRATEGIES, since the fear of them being stolen will be gone #77

Open JoelMahon opened 3 years ago

JoelMahon commented 3 years ago

I'm looking forward to sharing mine, the best strategy doesn't exist until you specify the meta it will be run in, but I'm still pretty proud of how many metas it can beat in a tournament regardless of the real meta.

zelosos commented 3 years ago

I will upload it in my fork

dosu0 commented 3 years ago

First, I would like to mention that the competition isn't over yet. Second, I recommend leaving a pull request here instead. We have an improved tournament runner, a leaderboard, and a discord as well.

HassanWahba commented 3 years ago

my strategy has 5 states of memory that describes the opponent and gets the best out of him

Xapakas commented 3 years ago

carykh's video description says "But, I'll give you all a 24-hour grace-period, which moves the deadline to May 27th, 2021 at 10 PM UTC, to allow for software bugs / resubmissions etc."

So be careful not to reveal too much!

JoelMahon commented 3 years ago

@Xapakas cheers, guess I'll leave it another day!

redtachyon2098 commented 3 years ago

Wait, there was a grace period?????

duckboycool commented 3 years ago

Wait, there was a grace period?????

Yes, it was put on the youtube video's description. I honestly feel like the grace period is too long, it should be for allowing final fixes for people who tried to submit at the end of the time and weren't able to, but instead it's a significant portion of the competition time and I feel like you have to use it or you're at a significant disadvantage. It also wasn't very clearly announced, I believe the video description was the only place.

Oh well, too late to change it at this point.

arielsysdev commented 3 years ago

My brain is too fried to make any more changes anyway, tried to refactor my code but the cleaner code performed way worse for some reason so I just submitted my ugly hacky one.

vlcanesin commented 3 years ago

STRATEGY NAME: revJossFGT6 - reversed Joss + forgiving grimTrigger

My strategy works like this: at the first rounds (I figured with a lot of testing in different environments that the best number of rounds for this was 6), it behaves like a "reversed Joss" (titFotTat that, when is continuously being attacked — and, therefofre, attacking — tries to cooperate). If the opponent didn't make any unprovoked attack, it would continue like this forever. Else, it would become what I like to call a forgiving grimTrigger. First, it would attack the opponent as much as I was attacked in the first 6 rounds. After that, it would send a cooperation request. If (and only if) it was accepted, the strategy would cooperate until it was betrayed (with an unprovoked attack). This cycle of attacking and cooperating would then continue forever.

So, as you can see, this is an addaptative approach. If the opponent cooperates for the initial rounds, then I would cooperate back with a very forgiving, but responsive, strat. What I mean by that is that it works like a titForTat, so it responds to the enemy's attack quickly (as opposed to a forgiving titForTat for example) but it is still very forgiving, because it likes to cooperate even if the opponent is attacking a lot (because SS (+3 +3) is better than TT (+1 +1)). And one thing is sure: since the opponent always cooperated in the first rounds, it is very unlikely that it would attack a lot in this stage. On the other hand, if the opponent attacked (and that would have to be an unprovoked attack) at least once in the initial rounds, I would respond that attack proportionally to my opponent's aggression. Think about it this way: the more the opponent attacks, the less likely it is for it to cooperate. But, still, since cooperating is almost always the best option, even if the opponent attacked me every time, I will send a cooperation offer. I forgot to mention that this offer shrinks the more it gets rejected, so if I face an alwaysDefect or even a 50/50 random strategy, I would eventually get deffensive and only attack back.

But there's a key element that is required for this strategy to work at its best that I didn't mention yet. It is the amount of randomness in the system. If you make a lot of copies of the random.py file in the exampleStrats folder and run the simulation, you will see that the best strategy will probably be the grimTrigger (as opposed to the standart one, titForTat). You can try it yourself, but the ideia behind it is that by getting deffensive (attacking a lot), the grimTrigger will always get at least 1 point each round and, on average, it will score 3 (~50% are TTs (+1 +1) and ~50% are TSs (+5 +0) -> avg = (+3 +0.5)). And since my strategy is essentially a grimTrigger against 50/50 randoms (63/64 times — the only way for it to be a revJoss would be if the random strategy output for the first rounds was SSSSSS, and this only happens once in 2^6 (64) times), it works reeeeally well against randoms. In fact, I tested it in the environment created by @l4vr0v (a fantastic IPD lunatic I'd say — with all respects, you're a beast) but modified with a lot of random copies (around 20% of the total amount of strategies) and, on average, my strategy got 2nd place! The first place (the priest I think — by the way, I tried to read its code but I couldn't understand anything about it lol) did a lot better in this environment (it always won basically), but, still, I'm very happy with the results. Sooo, I'm really hoping that the youtube comments are right, and that there will be a lot of random strategies submited.

(I don't really mind if people take advantage of knowing this strategy, since there were always forks from people like @l4vr0v with a lot of available strategies, which could be straight up copied for example. Also, the thing that matters the most isn't knowing about a single strategy, but the strategy trends, meaning the most common ones — those that can you can take advantage of — and the least common ones — those that you don't have to care about that much).

Wow, that was a lot! It's so cool to see how convoluted a game with only two possible choices for each player can get haha

carykh commented 3 years ago

hey guys, yeah this thread is cool!

To answer @duckboycool 's concerns about the grace period being too long: I made the grace period 24 hours because lots of people were submitting re-submission requests in the final 24 hours between May 25-26th 10 PM UTC. I went through my final "deletion session" just about an hour before the deadline hit (I've been doing them roughly every day), and I didn't want to delete anybody's strategy and not give them enough warning before the deadline actually hit. Given that I don't know what timezone people are living in, and whether they'll be free to do some coding at 4 PM, or midnight, or 8 AM, or whatever, I felt like I had to give them around ~24 hours to make sure they had a full revolution of the clock to hear that their response had been deleted, and that they could re-submit.

duckboycool commented 3 years ago

I guess because of the final submission deadline in #81 and the video FAQ, if you're really worried about others knowing your strategies, you should wait until May 28th 10 PM UTC. (Only people who have submitted a resubmission form already will be able to submit in 2 hours until then though.)

carykh commented 3 years ago

Yeah, the new deadline (ONLY for those who managed to submit re-submission requests late) is roughly 26 hours from now, May 28th 10 PM UTC. There's only about 40 people in this category. Once that date hits, absolutely no more resubmission requests can be accepted.

If you haven't submitted a resubmission request as of now, you can't anymore because we're passed original the submission deadline!

redtachyon2098 commented 3 years ago

I originally did ML stuff, but it was a bit slow and the result wasn't that good. So I made this. It's only 11 lines of code.

def strategy(history, memory):
    choice = 1
    if len(history[0]) > 1:
        choice = history[1,-1]
        courage = 0
        for x in range(len(history[0])):
            courage += 2 * history[1][x] - 1 
        courage /= len(history[0])
        if courage < 0.01 and len(history[0]) > 3:
            choice = 0
    return choice, None

It's Tit for Tat, but it checks the ratio between the opponent's defects and cooperations, if the opponent defects too much it turns grim, but if the opponent starts cooperating enough to offset the ratio, it returns to its normal behavior. This did unnaturally well for its simplicity, and did 8th place in the "prisoners dillemma enjoyers" fork of this repo.

ThatXliner commented 3 years ago

(The hyphens are there for showing indents.)

can't you just do ```py code ```

redtachyon2098 commented 3 years ago

I'll try that. Thanks.(EDIT: It worked.)

Barigamb738 commented 3 years ago

This is my strategy it is not the best but i tried.

import random

#THIS IS OMEGA JOSS!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
#Joss
#+grace period where it is just tit-for-tat
#+Agression detection
#+varing the chance of random defection based on agression
#+goes grim after "grimThreshold" defections
#+avoid double defections
#+programmable next choice (currently not used)

def strategy(history, memory):
    #vars
    TotalAgression = 0 #How many times it defected
    TotalNiceness = 0 #How many times it cooped
    choice = 1 #The choice of Omega Joss. duh.
    nextChoice = None #This overwrites the next move of Omega Joss

    #parameters (test for best value)
    chance = 0.8 #initial chance of defecting. TEST DONE
    gracePeriod = 10 #the period before any random defection. TEST DONE
    grimThreshold = 40

    if history.shape[1] < 1: #If it's the first round
        return choice, [TotalAgression, nextChoice] #Then Cooperate
    TotalAgression = memory[0] #Reading memory
    nextChoice = memory[1] #Same
    if nextChoice is not None: #Overwriteing any decision if nextChoice is not None
        choice = nextChoice #      ^
        nextChoice = None #        ^
    elif TotalAgression >= grimThreshold:
        choice = 0
    elif TotalAgression == history.shape[1]: #If it always defected then we defect
        choice = 0
    elif random.random() <= (chance - TotalAgression/(history.shape[1]) + TotalNiceness/(history.shape[1])) and history[0, -1] != 0 and history.shape[1] >= gracePeriod: 
        #Randomization and makeing sure it is not the grace period and that we avoid defecting twice in a row
        choice = 0  
    elif history[1,-1] == 0:
        choice = 0  
        TotalAgression += 1
    TotalNiceness = (history.shape[1]-gracePeriod)-TotalAgression
    if history.shape[1] == gracePeriod:
        TotalAgression = 0
        TotalNiceness = 0
    return choice, [TotalAgression, nextChoice] #Return the function. duh
redtachyon2098 commented 3 years ago

The stark contrast between yours and mine....

redtachyon2098 commented 3 years ago

This is another thing I made, it uses ML to predict the opponent's choice and brute forces the best option. But it's trash, despite being super long. Also the code is a piece of spaghetti garbage.

thinklength = 5

howahead = 3

LearnRate= 0.05

LearnIterations = 50

LearnThreshold = 1e-3

table = [[1,5],[0,6]]

import random as r

import time as t

alert = 20

alertatall = False

def activation(x):
    if(x > 0):
        return x
    else:
        return x / 1024

def clone(thing):
    if(type(thing) == list):
        a = []
        for x in range(len(thing)):
            a.append(clone(thing[x]))
        return a
    else:
        return thing

def derivative(x):
    if x > 0:
        return 1
    else:
        return 1 / 1024

class network:
    def __init__(self,nodes):
        self.nodes = []
        self.raw = []
        self.weights = []
        self.biases = []
        a = []
        self.CostValue = 1000
        for x in range(nodes[0]):
            a.append(0)
        self.nodes.append(a)
        self.raw.append(a)
        for x in range(len(nodes) - 1):
            self.nodes.append([])
            self.raw.append([])
            self.biases.append([])
            self.weights.append([])
            for y in range(nodes[x + 1]):
                self.raw[x + 1].append(0)
                self.nodes[x + 1].append(0)
                self.biases[x].append(r.random())
                self.weights[x].append([])
                for z in range(nodes[x]):
                    self.weights[x][y].append(r.random())

    def predict(self,input_list):
        self.nodes[0] = input_list
        self.raw[0] = input_list
        for x in range(len(self.biases)):
            a = []
            c = []
            for y in range(len(self.biases[x])):
                b = self.biases[x][y]
                for z in range(len(self.weights[x][y])):
                    b += self.weights[x][y][z] * self.nodes[x][z]
                a.append(activation(b))
                c.append(b)
            self.nodes[x + 1] = a
            self.raw[x + 1] = c

    def output(self):
        return self.nodes[len(self.nodes) - 1]

    def cost(self,input_list,output_list):
        self.predict(input_list)
        a = self.output()
        b = 0
        for x in range(len(a)):
            try:
                b += ((a[x] - output_list[x]) ** 2)
            except OverflowError:
                b += 16e+256
        self.CostValue = b
        return b

    def backprop(self, input_list, output_list):
        self.predict(input_list)
        w = clone(self.weights)
        b = clone(self.biases)
        expectedoutput = output_list
        for p in range(len(self.nodes) - 1):
            x = len(self.nodes) - p - 1
            differences = []
            for y in range(len(self.nodes[x])):
                differences.append(self.nodes[x][y] - expectedoutput[y])
            for y in range(len(self.nodes[x])):
                b[x - 1][y] = 2 * differences[y] * derivative(self.raw[x][y])
                for z in range(len(self.nodes[x - 1])):
                    w[x - 1][y][z] = self.nodes[x - 1][z] * 2 * differences[y] * derivative(self.raw[x][y])
            expectedoutput = []
            for y in range(len(self.nodes[x - 1])):
                a = 0
                for z in range(len(self.nodes[x])):
                    a += self.weights[x - 1][z][y] * 2 * differences[z] * derivative(self.raw[x][z])
                expectedoutput.append(((a / len(self.nodes[x])) / (-2)) + self.nodes[x - 1][y])
        return [w,b]

    def train(self,inputs,outputs,LearnRate,iterations):
        for q in range(iterations):
            total = 0
            c = self.backprop(inputs,outputs)
            avgCost = self.cost(inputs,outputs)
            for x in range(len(self.weights)):
                for y in range(len(self.weights[x])):
                    total += c[1][x][y]
                    for z in range(len(self.weights[x][y])):
                        total += c[0][x][y][z]
            if(total < 0):
                total = -total
            if(total == 0):
                total = 1e-256
            for x in range(len(self.weights)):
                for y in range(len(self.weights[x])):
                    self.biases[x][y] -= c[1][x][y] * LearnRate * (avgCost ** 0.5) / total
                    for z in range(len(self.weights[x][y])):
                        self.weights[x][y][z] -= c[0][x][y][z] * LearnRate * (avgCost ** 0.5) / total
            self.CostValue = avgCost
            if self.CostValue < 1e-10:
                break

def biggest(lis):
    big = [0]
    for x in range(len(lis) - 1):
        y = x + 1
        if(lis[y] > lis[big[0]]):
            big = [y]
        if(lis[y] == lis[big[0]] and big[0] != y):
            big.append(y)
    return big[r.randint(0,len(big) - 1)]

def shift(thing, newnum):
    a = thing
    for x in range(len(a) - 1):
        a[x] = a[x + 1]
    a[len(a) - 1] = newnum
    return a

def testseq(feed, sequence, nn):
    james = nn
    score = 0
    future = feed
    for x in range(len(sequence)):
        james.predict(future)
        adversary = 1 * (james.output()[0] > 0.5)
        future = shift(shift(future, sequence[x]), adversary)
        score += table[sequence[x]][adversary]
    return score

def tobinary(num, digits):
    number = num
    bits = []
    for q in range(digits):
        x = digits - q - 1
        if 2 ** x <= number:
            bits.append(1)
            number -= 2 ** x
        else:
            bits.append(0)
    return bits

inittime = t.time()

def strategy(history, memory):
    choice = 1
    tim = t.time()
    if type(memory) == list:
        player = memory[0]
        previousfeed = memory[1]
        wronged = memory[2]
    else:
        player = network([2 * thinklength, 10, 6, 1])
        wronged = False
    feed = []
    for x in range(2 * (thinklength - min(10, len(history[0])))):
        feed.append(0)
    for x in range(min(thinklength, len(history[0]))):
        feed.append(2 * (history[0, x + max(0, len(history[0]) - thinklength)] - 0.5))
        feed.append(2 * (history[1, x + max(0, len(history[0]) - thinklength)] - 0.5))
    if 0 in history[1]:
        wronged = True
    if type(memory) == list and wronged:
        player.predict(previousfeed)
        #if 1 * (player.output()[0] > 0.5) != history[1][len(history[0]) - 1]:
            #print(len(history[0]), 1 * (player.output()[0] > 0.5), history[1][len(history[0]) - 1], "incorrect!")
        player.train(previousfeed, [history[1][len(history[0]) - 1]], LearnRate, LearnIterations)
        #player.train(previousfeed, [history[1][len(history[0]) - 1]], LearnRate, LearnThreshold)
        options = []
        scores = []
        for x in range(2 ** howahead):
            trythis = tobinary(x, howahead)
            options.append(trythis)
            scores.append(testseq(feed, trythis, player))
        choice = options[biggest(scores)][0]
        if len(history[0]) % alert == 1 and alertatall:
            print(player.CostValue)
    if not wronged:
        choice = 1
        #if player.CostValue > 1:
        #    print(t.time() - inittime, "Crap")
    #print(t.time() - tim)
    return choice, [player, feed, wronged]
Barigamb738 commented 3 years ago

Wow that is impressive. I wouldn't have been able to make that.

redtachyon2098 commented 3 years ago

Thank you.

SebastianKorytkowski commented 3 years ago

This is my strategy it is not the best but i tried.

    TotalAgression = 0 #How many times it defected
    TotalNiceness = 0 #How many times it cooped
    ...
    TotalAgression = memory[0] #Reading memory
    nextChoice = memory[1] #Same
    ...
    return choice, [TotalAgression, nextChoice]

You only save TotalAgression to the memory. TotalNiceness will always be zero(it's set near the end of the function, but then never used). Not sure if that was what you intended to do. So this line

elif random.random() <= (chance - TotalAgression/(history.shape[1]) + TotalNiceness/(history.shape[1])) and history[0, -1] != 0 and history.shape[1] >= gracePeriod:

is exactly the same as

elif random.random() <= (chance - TotalAgression/(history.shape[1]))) and history[0, -1] != 0 and history.shape[1] >= gracePeriod:

Anyway here's my code

class Memory:
    def __init__(self):
        self.cooperateCount = 0
        self.defectCount = 0

        self.defectCountInARow = 0
        self.cooperateCountInARow = 0

        self.turnNumber = 0

        self.timesCheated = 0

    def analyseMove(self, history):
        self.turnNumber = history.shape[1]

        myLastMove = history[0, -1]
        enemyLastMove = history[1, -1]

        if enemyLastMove == 0:
            self.defectCount += 1
            self.defectCountInARow += 1
            self.cooperateCountInARow = 0
            if myLastMove == 1:
                self.timesCheated += 1
        else:
            self.cooperateCount += 1
            self.cooperateCountInARow += 1
            self.defectCountInARow = 0

    def decideMove(self, history):

        # this prevents wasting resources on always defecting enemies
        if self.defectCount / (float)(self.turnNumber) >= 0.44:
            return 0, self

        # enemy defected two times in a row he is not nice I'm going to be rude back!
        if self.defectCountInARow > 2:
            if self.defectCountInARow == 3 and self.cooperateCount != 0:  # maybe we can make up after all! Let's stop fighting!(but only if you were nice to me before)
                return 1, self
            if self.defectCountInARow == 4 and self.cooperateCount != 0:
                return 1, self
            return 0, self

        return self.defaultUpgradedTitForTat(history), self

    def defaultUpgradedTitForTat(self, history):
        myLastMove = history[0, -1]
        enemyLastMove = history[1, -1]

        if enemyLastMove == 0:  # enemy was rude
            if myLastMove == 1:  # i've been nice and enemy has been rude! I will be rude now as well!
                if self.turnNumber >= 2 and history[1, -2] and self.timesCheated < 3:  # he was nice before? let's be nice! (but only 2 chances!)
                    return 1
                return 0
            else:  # i've been rude and enemy has been rude! I guess we should try to make up
                return 1
        else:
            return 1

def strategy(history, memory):
    if memory is None:  # cooperate on the first turn
        return 1, Memory()
    else:
        memory.analyseMove(history)
        return memory.decideMove(history)

It's basically a quite forgiving titForTat that gives the enemy multiple chances to make up, but can also work as grim if enemy defects too many times.

vlcanesin commented 3 years ago

It seems like titForTat and grimTrigger variants are very common (which makes sense). Anyway, since you are posting the source code of your strategies, I figured I might as well post mine (for testing purposes or whatever).

def strategy(history, memory):         #---STRATEGY NAME: revJossFGT6 -> reversed Joss + forgiving grimTrigger
    current_round = history.shape[1]
    choice = None

    #---MEMORY ARRAY FORMAT: [current mode, opponent T's counter, variable, betrayed_counter allower, forgiveness factor]---#
    if memory == None:
        memory = ['', -1, 0, True, 2]

    if current_round < 6: #---BUILDING TRUST + GETTING DATA (for an optimal number of rounds based on testing)
        memory[0] = 'cooperation'
    elif memory[3]:
        if list(history[1]).count(0) == 0:
            memory[0] = 'cooperation'
            memory[1] = 0
        else:
            memory[0] = 'forgiving-grimTrigger'    #---Randomness/aggressiveness identified
            memory[1] = list(history[1]).count(0)

        memory[3] = False

    #---COOPERATION MODE (revJoss - reversed Joss)---#
    if memory[0] == 'cooperation':
        choice = 1
        if (current_round >= 1 and history[1,-1] == 0) and (list(history[1,-3:]) != [0,0,0]):
            choice = 0

    #---ATTACKING MODE (FGT - forgiving grimTrigger)---#
    else:
        if type(memory[2]) == int:
            memory[2] = memory[2] + 1
            if memory[2] <= memory[1]:
                choice = 0 #---ATTACK! (proportional to opponent's aggression)
            elif memory[2] <= memory[1] + memory[4]:
                choice = 1 #---INVITATION (for memory[4] rounds)
            elif (history[1, -1] == 1 or history[1, -2] == 1) and memory[4] > 0: #---INVITATION ACCEPTED
                memory[2] = 'play_1_until_betrayal'
                memory[4] = min(memory[4] + 1, 2) #---COOPERATE MORE
            else:
                memory[2] = 0 #---INVITATION DENIED
                memory[4] = memory[4] - 1 #---COOPERATE LESS
                choice = 0

        if memory[2] == 'play_1_until_betrayal':
            choice = 1
            if history[1, -1] == 0: #---BETRAYED!
                memory[2] = 0
                choice = 0

    return choice, memory
redtachyon2098 commented 3 years ago

A lot of people are posting their strategies! I really hope there will be another one of these soon.

maxkl commented 3 years ago

That's a great idea! Here's my strategy in just 9 LOC, called "weightedMonotonic" (also here):

# Weights for opponents' past moves, starting with the most recent one
weights = [1, -0.8, 0.8, -0.8, 0.8, -0.8, 0.8, -0.8, 0.8]

def strategy(history, memory):
    round_index = history.shape[1]
    opponent_history = history[1]

    acc = 0
    for (i, w) in enumerate(weights):
        if i < round_index:
            # Weigh opponents' i-to-last move (-1 if they defected, +1 if they cooperated)
            acc += w * (opponent_history[-i - 1] * 2 - 1)

    # Cooperate if weighted sum is at least 0. It's important to cooperate in the first round (acc=0) as to not anger
    # the Grim Trigger
    return 1 if acc >= 0 else 0, None

I admit that it doesn't really conform to the "pure" prisoner's dilemma tournament challenge as it doesn't even try to figure out the opponent strategy and relies neither on smart conditionals nor on memory. But it works exceptionally well in my test runs against a pool of various strategies by l4vr0v, valadaptive and redtachyon2098 (huge respect and thanks to you!). It's consistently in first place :) I'm currently running it against the Prisoners-Dilemma-Enjoyers pool and I'll update you when it's done.

It's basically just a weighted sum over the opponent's past 9 moves and then a threshold for deciding whether to cooperate or defect. The idea came from FIR filters but I didn't apply any signal processing or FIR theory except for the way the calculation is done. I planned to optimize the weights using a genetic algorithm but sadly didn't have the time. The results I got using just my intuition and a bit of experimentation were already way better than I expected.

I noticed that it gets a lot worse if there are many random strategies in the pool, so I hope not too many people were funny and entered one... It does way better against titForTat and grimTrigger variants which I expect there'll be a lot of.

redtachyon2098 commented 3 years ago

Wow! I never thought of that! The optimal weights might change depending on what types of strategies there are, so it won't be entirely consistent, but if you already know which strategies you are playing against, it would perform well.

ThatXliner commented 3 years ago

Man, this is like ML... but not ML...

redtachyon2098 commented 3 years ago

Hey! I've got an idea! You should make an evolution simulator for refining the weights of your algorithm to make the ultimate strategy!

maxkl commented 3 years ago

Wow! I never thought of that! The optimal weights might change depending on what types of strategies there are, so it won't be entirely consistent, but if you already know which strategies you are playing against, it would perform well.

You're probably right. It ends up in 20th place in the Prisoners-Dilemma-Enjoyers pool, even behind some that were in my original test pool. Oh man, this challenge isn't easy (d'oh!)

redtachyon2098 commented 3 years ago

Uh-oh.

Xapakas commented 3 years ago

Yay, I can finally share my strategy! This is diplomat.py, and places 4th in the strategies currently in the "prisoner's dilemma enjoyers" repo. (But I'm sure there's quite a few more that could give it a run for its money!)

# OTFT with a bunch of tweaks
#
# memory[0]: deadlock counter
# memory[1]: exploiter counter
# memory[2]: # of backlogged Cooperations
# memory[3]: opponent non-blockade Coops
# memory[4]: non-blockade rounds
# memory[5]: opponent blockade Coops
# memory[6]: blockade rounds
# memory[7]: "isBlockade" boolean

import random

def strategy(history, memory):
    deadlockThreshold = 3
    exploiterThreshold = 8

    if memory == None:
        memory = [0,0,0,0,0,0,0,False]
        choice = "cooperate"
        return choice, memory

    # update occurrence counters
    if not memory[7]: # non-blockade round
        memory[4] += 1
        if history[1,-1] == 1:
            memory[3] += 1
    else: # blockade round
        memory[6] += 1
        if history[1,-1] == 1:
            memory[5] += 1
        if memory[6] == 10: # after 10 turns, reevaluate blockade
            if memory[5] == 0 and memory[3] > 0:
                memory[1] = 0 # end blockade
               memory[7] = False
               memory[2] = 2 # double Coop to end blockade

    if memory[2] != 0: # consider backlogged Coops
        choice = "cooperate"
        memory[2] -= 1

    # check if a deadlock is detected.
    elif memory[0] >= deadlockThreshold:
        # stay silent twice as a truce attempt
        choice = "cooperate"

        if memory[0] == deadlockThreshold:
            memory[0] += 1
        else:
            memory[0] = 0
    else:
        if history.shape[1] >= 2 and history.shape[1] <= 20:
            # modify the exploiter counter
            if history[1,-1] == 1 and history[1,-2] == 1:
                # opponent cooperates twice; decrement
                memory[1] -= 1
            if history[1,-1] != history[1,-2]:
                # opponent changes move; increment
                memory[1] += 1
            if history[1,-1] != history[0,-2]:
                # opponent moves in a non-TFT fashion; increment
                memory[1] += 1

        if memory[1] >= exploiterThreshold:
            # tell truth indefinitely
            # (exploiter counter will never fall below threshold)
            choice = "defect"
            memory[7] = True
        else:
            if history[1,-1] == 0: # tit-for-tat
                choice = "defect"
            else:
                choice = "cooperate"

            # update deadlock counter
            if history.shape[1] >= 2 and history[1,-1] != history[1,-2]:
                memory[0] += 1
            else:
                memory[0] = 0

    if history.shape[1] >= 1 and history[0,-1] == 0 and history[1,-1] == 0 \
       and choice == "defect" and memory[3] > 0:
        # attempt to break double defection with double coop
        if memory[1] < exploiterThreshold and random.random() <= 0.1:
            choice = "cooperate"
            memory[2] = 1

    return choice, memory

Diplomat builds off of an extremely strong strategy called "Omega Tit-for-Tat". OTFT makes a number of improvements to regular tit-for-tat, and with diplomat I've tried to make even more. Here's a list of what diplomat adds to tit-for-tat (henceforth TFT):

  1. (from Omega) Diplomat implements random detection (which, given carykh's shoutout to that strategy in the video, I'm guessing is very important) with a randomness (or "exploiter") counter. Rather than trying to detect statistical randomness, it detects illogical movements, which also helps to lump alwaysDefect into the "random" category. After exceeding the randomness threshold, Diplomat defects indefinitely. (Sort of...)
  2. (from me) 10 turns after lumping an opponent into the "random" category and defecting indefinitely (henceforth a "blockade"), Diplomat reevaluates its decision. While I experimented with opponent backstab and non-forgiveness ratios, at the end of the day, the most effective way to determine that an opponent is not either random or alwaysDefect is to see if it made any cooperative moves during the blockade. If it made cooperative moves before the blockade, but not during the blockade, Diplomat lifts the blockade and offers a "truce" by cooperating twice in a row.
  3. (from me) Diplomat stops attempting to detect random or alwaysDefect after turn 20. I have yet to see it fail to detect random before turn 15, and this helps it to not misidentify opponent strategies that are exploitative but nevertheless worth cooperating with.
  4. (from Omega) Diplomat implements deadlock detection. A deadlock is when both opponents alternate between Cooperating and Defecting. You may notice that if two TFT-based algorithms are facing each other, and one Defects for whatever reason, both will continue cycling between Defect and Cooperate indefinitely. Diplomat detects deadlocks with a deadlock counter, which increments every time the opponent changes their move and resets to zero every time the opponent makes the same move twice. When a deadlock is detected, Diplomat cooperates twice in a row to attempt to restore a chain of Cooperations.
  5. (from me) Another fault of TFT is that if both strategies ever Defect in a single turn, both strategies will Defect until the end of the game, even though they could earn more points by restoring the peace and reestablishing a Cooperation chain. If both Diplomat and the opponent strategy defected last round, and the opponent is known not to be random or alwaysDefect, there is a 10% chance that Diplomat will attempt to break the Defection chain by Cooperating twice (this percentage was chosen by lots of tweaking in the prisoner's dilemma enjoyers repo; there's no mathematical significance to it).

If all of the above has been considered, and Diplomat still does not have a move to make, it will act as pure tit-for-tat. I experimented with a couple other techniques: Josh recommended starting the randomness counter much higher if the opponent starts by Defecting, since most logical opponents will start by Cooperating. I was unable to find success with this, perhaps because it gives strategies that start out mean but are able to play nice later on less of a chance to redeem themselves. Also, I considered disabling the "forgiveness" feature from step 5 during the first X turns so that no detective strategies got the wrong idea. This likely ran into the same problem as before. Additionally, it did not account for "late detectives", which are smart and likely to take up some portion of the final meta.

I had so much fun with this! I don't expect this strategy to win, but hopefully it can crack the top 10. We should do something again like this soon. Perhaps the Noisy Iterated Prisoner's Dilemma next??

Edit: github formatting, smh

redtachyon2098 commented 3 years ago

Wow, you must be very passionate about your work!

maxkl commented 3 years ago

@Xapakas FYI, yours got to 2nd place with my strategy in the pool (if that's the latest version in the repo)

BenWilop commented 3 years ago

A friend of mine coded this bot (we call him unanim) with me. We sent in the wrong version, the acceptable deviation should have been 0.45 because now LLN won't trigger randoms well.

import numpy as np

''' We use a derivation of the TitForTat-Strategy to force the enemy into cooperating if he wants to maximize his overall score. Moreover forgiving elements can be found in addition to counter measurements against detective and random strategies

We start with the state "detective_phase" that represents a TitForTat play to convince the enemy into cooperating. Furthermore this state is established after every friendly_sequence of 100% cooperating moves to block any new unsought investigations.

If we are not in such a phase we play as forgiving as humanly possible, i.e. after a cycle of switching moves [DC][CD] we again to break escalation spirals asap. This however only happens a maximum of max_forgiving times in row, if the distance is lower or equal to min_distance_forgiving for each of these cycles. In case of such an event, we continue to play a non forgiving TitForTat and won't activate a new detective phase after this (because it does not fall into a deeply planned strategy to play this pattern). The counter resets after another friendly_sequence.

Over the course of the whole game we investigate the "negativeness" of all moves after the first detective phase. This is done by measuring the deviation between the opponent and TitForTat. If we examined more than random_sequence moves and the ratio of "unintuitive" moves is greater or equal to the acceptable_deviation we start defecting until this ratio is bad. '''

memory = [string: "modus", int: distanceforgiving, int: numberforgiving, int: detectivephase, list: titfortatarray]

def strategy(history, memory):

initializing variables and memory

detective_phase = 5
friendly_sequence = 7
random_sequence = 10
max_forgiving = 3
min_distance_forgiving = 4
acceptable_deviation = 0.5
choice = "cooperate"
round = history.shape[1]
if memory == None:
    memory = ["TitForTat", 0, 0, detective_phase, []]

#checking the modus to determine our strategy
if memory[0] == "TitForTat" and history.shape[1] >= 1:
    choice = history[1, -1]
elif memory[0] == "ForgivingTitForTat":
    choice = history[1, -1]
    if history[0, -2] == 0 and history[0, -1] == 1 and history[1, -2] == 1 and history[1, -1] == 0:
        choice = "cooperate"
        if round - memory[1] <= min_distance_forgiving:
            memory[2] += 1
        else:
            memory[2] = 0
        memory[1] = round
elif memory[0] == "Defect":
    choice = "defect"

#activating the modus
#detection of a friendly_sequence (10 consecutive cooperations)
if round >= friendly_sequence:
    reset = True
    for i in range(1, 1 + friendly_sequence):
        if not (history[0, -i] == 1):
            reset = False
    if reset:
        memory = ["TitForTat", 0, 0, detective_phase, memory[4]]
#end of the detective_phase
if memory[0] == "TitForTat":
    memory[3] -= 1
    if memory[3] == 0:
        memory[0] = "ForgivingTitForTat"
#end of Forgivingphase
if memory[2] >= max_forgiving:
    memory[0] = "TitForTat"
#detection of a random or hyperaggressive player
if round >= random_sequence + detective_phase:
    deviation = memory[4] - history[1, detective_phase:]
    deviation_rate = 1 - np.count_nonzero(deviation == 0) / len(deviation)
    if deviation_rate >= acceptable_deviation:
        memory[0] = "Defect"
if round >= detective_phase:
    memory[4].append(history[0, -1])

return choice, memory
Angel-IG commented 3 years ago

Here is my submission, called PacificAntiRandomTFT. It ranks pretty high against the strategies in the @l4vr0v fork (which, btw, is great!) and other similar pools of agents I tried (varying, for example, the random and TFT agents distributions) even though it's really simple and heavily improvable. Code:

import numpy as np

# PacificAntiRandomTFT
#
# Just like TFT except:
# - Detects random and goes DDDDDDDDD... against it.
# - Against D/D loops it tries to cooperate twice in a row
#   to try to get into a C/C loop (which is better for both).
#   If the opponent declines, it doesn't ever try again.
#
# In this implementation, memory stores the state of the
# strategy: "defect" (always defect), "declined" (opponent
# declined to exit the D/D loop), "coop next" (have to try
# to cooperate next to exit the D/D loop), "did they declined?"
# (we were in "coop next" in the previous round, and now we have
# to check if the opponent accepted our D/D-loop-exit or not) or
# None (none of the others).

# Parameters:
RANDOM_CHECK_ROUND = 16  # In which round (starting from 0) do I have to check for randomness?
MAX_UNPROVOKED_DEFECTIONS = 4  # Too many unprovoked defections means the opponent is random

def strategy(history, memory):
    choice = 1  # Default: cooperate
    new_state = memory  # Default: don't change state

    if history.shape[1] == RANDOM_CHECK_ROUND:
        # Checking for random

        unprovoked_defections = 0
        previous_move = None

        for move in history.T:
            if previous_move is None:
                unprovoked_defections += 1 if move[1] == 0 else 0
            elif move[1] == 0 and previous_move[0] == 1:
                unprovoked_defections += 1
            previous_move = move

        if unprovoked_defections > MAX_UNPROVOKED_DEFECTIONS:
            # Playing against random, let's always defect
            new_state = "defect"
            return 0, new_state

    if memory == "defect":
        choice = 0
    elif memory == "coop next":
        # Choice is already 1 by default
        new_state = "did they declined?"
    elif memory == "did they declined?":
        if history[1, -1] == 0:
            # They declined
            new_state = "declined"
            choice = 0
        else:
            # They accepted!
            new_state = None
    elif memory == "declined":
        # Normal TFT, with a check for changing state (see else statement)
        if history.shape[1] >= 1 and history[1, -1] == 0:  # Choose to defect if and only if the opponent just defected
            choice = 0
        else:
            # We somehow went out from the D/D loop. Let's forget the opponent declined an offer...
            new_state = None
    else:
        if history.shape[1] >= 2 and np.count_nonzero(history[:, -1]) == 0 and np.count_nonzero(history[:, -2]) == 0:
            # Last two turns were D/D. Let's try to end with this loop
            return 1, "coop next"

        # Not in a loop: normal TFT
        if history.shape[1] >= 1 and history[1, -1] == 0:  # Choose to defect if and only if the opponent just defected
            choice = 0

    return choice, new_state

Also (in case you are wondering), I tried to make an UltraPacific version (meaning that it also tries to convert the C/D D/C loops into C/Cs) and it didn't seem to work as well as the final version I've shown...

ghost commented 3 years ago
pointsArray = [[1,5],[0,3]]

def strategy(history, memory):
    if memory is None:
        return 1, 1
    score = sum([pointsArray[history[0][i]][history[1][i]] for i in range(memory)]) / memory
    return score >= 2.25, memory + 1
zelosos commented 3 years ago

My solution was created in 10 minutes. Here it is:

import random

def strategy(history, memory):
  """
  Author: zelosos

  Lets write down some assumptions:
    - In a group most of the players will not be exploited.
    - Players will test, if they can exploit others.
    - Both cooperating for the hole time is best for both.

  With this said:
    - target should be: both coorperate for the hole time

  Improvements:
    - Maybe this could be improved by the weights of the score matrix, but I have to go to bed now.
  """
  # start with cooperate
  if len(history[1]) == 0:
    return 1, None

  played_rounds = len(history[1])
  got_wasted = len(history[1])-sum(history[1]) # how often we got cheated
  if random.uniform(0, 1) < got_wasted/played_rounds: # how often we got cheated determin how likely it is to cheat back
    return 0, None
  return 1, None
jon-edward commented 3 years ago

I first implemented the Derived Belief Strategy which was originally designed by Tsz-Chiu Au and referenced in this paper, however I found that it only barely scored above Tit for Tat in the typical Iterated Prisoner's Dilemma. This is by design, however, because it's basically Tit for Tat with a layer that evaluates with a level of confidence whether an action is an "accident" (in the context of the Noisy Iterated Prisoner's Dilemma) or whether it conforms to their previous strategy.

I wasn't satisfied with my ability to place well in the tournament with a strategy that only barely placed above an example strategy.

So, my next strategy that I implemented was a Moran-process-trained looker-up model, which scored sufficiently above Tit for Tat that I was satisfied submitting it over the Derived Belief Strategy.

Not only did this perform better, but DBS' computations were pretty heavy and the looker-up model is much faster.

EFHIII commented 3 years ago

I initially started by messing around making a lookup table manually, which performed relatively well in the tests I did. I tried looker-up which performed similarly well, but OmegaTFT combined with some added heuristics and cases seemed to be the most successful so that's where I ultimately focused my efforts near the end, and is the kind of strategy I submitted. I still did use a few manually made lookups, but just for a couple special cases.

Rasilu commented 3 years ago

First I made some assumptions which I explained more in detail here and created my strategy accordingly. This is what I submitted:

# The War Criminal
# Tests how much an opponent can be exploited by looking at the profit we get compared to always doing (C,C).
# This works great against overly forgiving strategies such as ftft where this strategy alternates between C and D.
# This strategy also has some damage control mechanisms to prevent getting into a negative profit loop with the
# opponent and it uses statistical analysis to detect if the opponent makes random choices, in which case it is better
# to just defect. peko

def strategy(history, memory):
    history_range = 5
    turn = history.shape[1]
    if memory is None:
        memory = {
            "try_exploit": True,
            "exploit_turns": [],
            "exploit_max_turns": None,
            "exploit_max_profit": 0,
            "damage_control_turns": [40, 100]
        }

    # check if opponent is random
    if turn >= 20 and likely_random(history):
        memory["exploit_max_turns"] = None
        memory["exploit_max_profit"] = 0
        memory["exploit_turns"] = []
        return 0, memory

    # try to exploit opponent
    if memory["try_exploit"]:
        return try_exploit(history, memory)

    # damage control
    if turn >= history_range:
        opp_history = history[1, -history_range:]

        # prevent two Tit-for-Tat like strategies to alternate between (C,D) and (D,C)
        if are_histories_alternating(history, 3):
            return 1, memory

        # if opponent keeps defecting we try getting them to cooperate again
        if sum(opp_history[2:]) == 0:
            if sum(opp_history) == 0:
                if turn in memory["damage_control_turns"]:
                    return 1, memory
                return 0, memory
            else:
                return 1, memory

    return tit_for_tat(history, memory)

# Helper Functions
def are_histories_alternating(history, length):
    our_history = history[0, :-1][-length:]
    opp_history = history[1, 1:][-length:]
    prev_choice = our_history[0]
    for i in range(1, len(our_history)):
        our_choice = our_history[i]
        opp_choice = opp_history[i]
        if our_choice != opp_choice or opp_choice == prev_choice:
            return False
        prev_choice = our_history[i]
    return True

def calculate_profit(history):
    our_history = [x for (x, y) in history]
    opp_history = [y for (x, y) in history]
    profit = 0
    for i in range(len(our_history)):
        if our_history[i] == 1:
            profit += 0 if opp_history[i] == 1 else -3
        else:
            profit += 2 if opp_history[i] == 1 else -2
    return profit

def likely_random(history):
    # Use statistical analysis to determine if opponent acts randomly
    random_limit = 0.28
    our_history = history[0, :-1]
    opp_history = history[1, 1:]
    our_coops = sum(our_history)
    our_defects = len(our_history) - our_coops

    c, d = (0, 0)
    for i in range(len(our_history)):
        if our_history[i] == 1:
            c += 1 if opp_history[i] == 1 else -1
        if our_history[i] == 0:
            d += 1 if opp_history[i] == 1 else -1

    consistency = (abs(c) + abs(d)) / len(our_history)
    if consistency < random_limit:
        return True
    return False

# Sub Strats
def tit_for_tat(history, memory):
    choice = 1 if history.shape[1] < 1 else history[1, -1]
    return choice, memory

def try_exploit(history, memory):
    exploit_buffer = 1
    turn = history.shape[1]

    # calculate profit
    exploit_history = [history[:, i] for i in memory["exploit_turns"]]
    exploit_profit = calculate_profit(exploit_history)

    # test how much opponent can be exploited
    if memory["exploit_max_turns"] is None:
        if exploit_profit >= memory["exploit_max_profit"]:
            memory["exploit_max_profit"] = exploit_profit
            memory["exploit_turns"].append(turn)
            return 0, memory
        else:
            memory["exploit_max_turns"] = len(exploit_history) - 2
            return 1, memory

    # check if exploit is still profitable
    if memory["exploit_max_turns"] <= 0:
        memory["try_exploit"] = False
        memory["exploit_max_turns"] = None
        memory["exploit_max_profit"] = 0
        memory["exploit_turns"] = []
        return 1, memory
    elif exploit_profit < 0:
        memory["exploit_max_turns"] = None
        memory["exploit_max_profit"] = 0
        memory["exploit_turns"] = []
        return 1, memory

    # repeat exploit
    elif memory["exploit_max_turns"] > len(exploit_history):
        memory["exploit_turns"].append(turn)
        return 0, memory

    # prepare for next exploit when situation has stabilized
    elif sum(history[0, -exploit_buffer:]) == sum(history[1, -exploit_buffer:]) == exploit_buffer:
        if exploit_profit >= 0:
            memory["exploit_turns"] = [turn]
            return 0, memory
        else:
            memory["try_exploit"] = False
            return tit_for_tat(history, memory)

    return tit_for_tat(history, memory)
redtachyon2098 commented 3 years ago

Elaborate.

Rasilu commented 3 years ago

code go brrr

redtachyon2098 commented 3 years ago

Yours crashed? Unfortunate.