Nikoleta-v3 / meta-analysis-of-prisoners-dilemma-tournaments

A repo to generate axelrod tournaments
MIT License
0 stars 1 forks source link

Make copy edits and add some TODO tags. #28

Closed drvinceknight closed 4 years ago

drvinceknight commented 4 years ago

One of the TODO tags is about a particular paragraph that I really struggle with. If I'm alone in not liking the paragraph then I'm happy for it to be left as it is (I'm not entirely sure I could offer my own version of it as I'm not sure I understand it).

marcharper commented 4 years ago

Since the footnote was removed we could split that paragraph in the middle where it starts to focus on TFT

drvinceknight commented 4 years ago

Since the footnote was removed we could split that paragraph in the middle where it starts to focus on TFT

Yeah that's a good idea :+1:

marcharper commented 4 years ago

What about TFT's C_r / C_median ? Maybe C_mean is skewed?

Maybe it doesn't match the overall tournament C_r but it should match the C_r of its opponents per match for standard tournaments, no? It can only be ~2 off for any given (non-noisy) match.

drvinceknight commented 4 years ago

Ended up going the other way with 32f5da2.

Spoke with @Nikoleta-v3 and we both agreed that that paragraph seemed to be making claims that don't immediately follow from the results. I've made some notes in the commit message and @Nikoleta-v3 is going to paste some plots here.

I've added a TODO and @Nikoleta-v3 is going to merge this and address the TODOs.

Nikoleta-v3 commented 4 years ago

I am attaching the distributions of (C_r / C_mean) and (C_r / C_median) for Tit For Tat, Tit For 2 Tats and Random for only standard tournaments.

Tit For Tat

tft_to_mean tft_to_median

Tit For 2 Tats

tf2t_to_mean tf2t_to_median

Random

random_to_mean random_to_median

I think we should discuss why we thing (based on our analysis) Tit For Tat performed well in Axelrod's tournament. Personally, I would add a paragraph in the Discussion like:

"""In standard tournaments being the mean/median cooperator pays off. Tit For Tat was that in the original tournament because of the strategy's properties and that could explain the performance of the strategy. There are other properties discussed here however that the strategy doesn't have. Thus, it could indicate why even in standard tournaments the strategy does not perform well"""

Something across this line, and would also include a similar brief discussion for memory-one strategies (based on what Marc has already written).

drvinceknight commented 4 years ago

Something along those lines sounds really good to me @Nikoleta-v3 although I suggest you go ahead and write it and we can reflect/adjust once it's written (we can always revert commits and come back if need be).

marcharper commented 4 years ago

I guess I'm a bit confused -- don't these plots support the existing text? TFT is closer to the population median than TF2T and Random, but this is Random(0.5) right? So we wouldn't expect it to get close to C_r, which is the hypothetical Random mentioned in the first paragraph. Or did you compute new values for Random(C_median) or Random(C_mean)? By definition those have to get close to C_median or C_mean, with some variance due to random draws.

The points I was trying to make are:

There's some variation here since TFT doesn't know the outcomes of third party games, so if other strategies play very differently against non-TFT strategies then it won't perfectly match the overall tournament C_r, but it matches the median C_r of its opponents. In the revisiting paper data for Axelrod's second tournament, the payoff heatmap shows that most strategies are generally cooperative, whereas in our strategy pool there are many more aggressive / defect-heavy players -- I suspect TFT matches the full tournament C_r more closely there (a point that could be revisited in the revisiting paper, pending the outcome of this one)

In the second paragraph, I tried to carry some of these ideas forward to memory-one players more generally. It's odd that they've done well in past tournaments since intuitively there's information to be gained from looking at longer histories. However, memory-one strategies "have access" to five important features in the model, C_r and the four memory-one conditional probabilities. So our model explains somewhat why you can get a lot of performance out of a memory one strategy, especially in small tournaments lacking sophisticated opponents.

marcharper commented 4 years ago

And perhaps the reason that the C_r for TFT > C_median in our data is that it doesn't try to exploit weaker opponents, and does try to force opponents to cooperate, whereas in third party games in our tournament there's more exploitation occurring.

drvinceknight commented 4 years ago

I guess I'm a bit confused -- don't these plots support the existing text? TFT is closer to the population median than TF2T and Random, but this is Random(0.5) right? So we wouldn't expect it to get close to C_r, which is the hypothetical Random mentioned in the first paragraph. Or did you compute new values for Random(C_median) or Random(C_mean)? By definition those have to get close to C_median or C_mean, with some variance due to random draws.

I don't think it does. I think those plots show that a large proportion (perhaps not the majority of course) of the time TFT is in fact not near the population median.

The points I was trying to make are:

  • TFT by definition must cooperate almost exactly as much as its opponents, so it should naturally land close to some central measure of C_r (in a standard tournament).

It won't always land close (as shown in the plots). I do see the point you're trying to make, I'm just struggling with the language and precision.

There's some variation here since TFT doesn't know the outcomes of third party games, so if other strategies play very differently against non-TFT strategies then it won't perfectly match the overall tournament C_r, but it matches the median C_r of its opponents. In the revisiting paper data for Axelrod's second tournament, the payoff heatmap shows that most strategies are generally cooperative, whereas in our strategy pool there are many more aggressive / defect-heavy players -- I suspect TFT matches the full tournament C_r more closely there (a point that could be revisited in the revisiting paper, pending the outcome of this one)

  • For contrast, NTitsForMTats won't get as close to the optimal C_r as TFT, for N != M surely (but even then it's exploitable for M > N and maybe for N=M>1)
  • The point of describing a strategy that is Random(C_median) is that merely achieving the "optimal" C_r isn't sufficient -- the probabilities like P(C|CC) and the other three are also important features in the model. TFT minimizes being exploited (and being the exploiter) by picking 0 or 1 values for these probabilities. Other ways of achieving optimal C_r, such as Random(C_optimal) with different conditional probabilities, would potentially get exploited more, trigger Grim-like strategies, or otherwise have a different score distribution than TFT. So how a strategy achieves optimal C_r is important.

This is worth including. When we say "how a strategy achieves" the optimal C_r we mean the distribution, yes? I don't think this was clear to me before.

  • In our tournaments, there are exploitable strategies and TFT doesn't exploit any other strategies. Hence it may have done well in tournaments without many exploitable opponents, but in our more diverse collection of strategies, it doesn't. This helps explain why TFT is not longer a high performer in newer tournaments.

I feel that this conclusion comes from more than the data. I don't disagree at all but I don't think it follows from the data in this particular paper.

In the second paragraph, I tried to carry some of these ideas forward to memory-one players more generally. It's odd that they've done well in past tournaments since intuitively there's information to be gained from looking at longer histories. However, memory-one strategies "have access" to five important features in the model, C_r and the four memory-one conditional probabilities. So our model explains somewhat why you can get a lot of performance out of a memory one strategy, especially in small tournaments lacking sophisticated opponents.

I think my main problem with the paragraph(s) is that I feel it was making conclusions that I didn't necessarily agree with and I also felt there were some conclusions that seemed to pull from things that are not in the data for this paper. When we chatted about this @Nikoleta-v3 suggested putting the points you're making in the discussion where I think it fits better.

It might be helpful for the 3 of us to have a video call sometime to chat about this perhaps? Possibly after Nik makes a suggested edit so we can discuss pros and cons (maybe Nik just PRs to this branch).

marcharper commented 4 years ago

Yes, let's chat in person. Ultimately we have to all agree if it's to be included of course. I agree with some of your points above, but I also think we're arguing somewhat different things.

At a high level, I'm trying to argue that the data suggests a model -- that optimal C_r, P(C| CD) and friends, etc. are important features of a strategy for it to win tournaments -- and that in turn can explain why TFT was successful in the past. The claim isn't that TFT achieves optimal C_r in this data set (it demonstrably does not, and it's not top ranked, so it shouldn't be expected to), but that it can, and can do so better than e.g. a Random strategy that happens to hit the right C_r. So I'm not trying to make an argument re: TFT directly from this data (that's why these paragraphs occur in the discussion section rather than in the results), rather from the modeling outcomes, and certainly it could be more clear on that front. (And of course it isn't necessarily 100% correct -- it's a model after all and the correlations aren't perfect -- in the revisting paper we can confirm or reject parts of it.)

Taking a further step back, the point was to extrapolate insights from the model, to discuss potential implications. Did we learn how to design a good strategy? Did we learn why some strategies are successful, or have been successful? I think yes, to some extent. The model sheds some light:

drvinceknight commented 4 years ago

At a high level, I'm trying to argue that the data suggests a model -- that optimal C_r, P(C| CD) and friends, etc. are important features of a strategy for it to win tournaments -- and that in turn can explain why TFT was successful in the past. The claim isn't that TFT achieves optimal C_r in this data set (it demonstrably does not, and it's not top ranked, so it shouldn't be expected to), but that it can, and can do so better than e.g. a Random strategy that happens to hit the right C_r. So I'm not trying to make an argument re: TFT directly from this data (that's why these paragraphs occur in the discussion section rather than in the results),

(I believe this is perhaps one of the points of discussion/misunderstanding, this paragraph is currently not in the discussion but in the early results (If I'm reading it correctly?). One of the things I did with 32f5da2 was add a %TODO to the discussion section to essentially move the paragraphs there.)

rather from the modeling outcomes, and certainly it could be more clear on that front. (And of course it isn't necessarily 100% correct -- it's a model after all and the correlations aren't perfect -- in the revisting paper we can confirm or reject parts of it.)

Taking a further step back, the point was to extrapolate insights from the model, to discuss potential implications. Did we learn how to design a good strategy? Did we learn why some strategies are successful, or have been successful? I think yes, to some extent. The model sheds some light:

  • on how TFT works and why it used to win
  • on why "simple" strategies (e.g. memory-one strategies) were successful early on -- they have access to some of the important features necessary to win
  • on why complex strategies (if well-designed / well-trained) can be successful (in the presence of exploitable opponents, noise, etc.), and possibly why they weren't previously successful (lack of diverse opponents to train against, for example)

I think I agree with all of what you've said @marcharper. Probably just a case of getting the location of the paper and language right. I've sent out an email to arrange a time for a talk :)

marcharper commented 4 years ago

this paragraph is currently not in the discussion but in the early results

Ah, yes, that's my bad then. I think I renamed Conclusion to Discussion as some point and then my memory was scrambled a bit. Looking forward to chatting!