carykh / PrisonersDilemmaTournament

Watch This Place's awesome video about iterated Prisoner's Dilemma for context! https://www.youtube.com/watch?v=BOvAbjfJ0x0
MIT License
205 stars 159 forks source link

For everyone who's not subscribed to carykh on YouTube #87

Closed donno2048 closed 2 years ago

donno2048 commented 3 years ago

Prisoner's Dilemma Tournament update: So, I have results that I could share!

In the end, 1,468 of the total 1,615 strategies passed all four requirements: 1) was a .py file, 2) ran against the default 9 strats in under 10 seconds, 3) didn't do anything nefarious, and 4) never crashed when paired up against any other strategy. Including the original 9 strats means the entire working roster has 1,477 contenders.

However, due to a 1477*1477 matrix having over 2 million entries (1 million bc of symmetry), and some of those entries taking >1 second to run (most are faster), each single "full-pass" takes 8-10 hours to run. I've done 2 "full-passes" so far, and the top strategies on the leaderboard move around n spots, where n is maybe 20% of their rank (so if they're 30th, they might end up 24th the next round.) #2 moves, but #1 does not. This is also because hundreds of strategies scored very close together. Yes, I'm aware there are many ways to optimize, including multi-threading, caching, and more. However, since I'm already confident the top spot won't ever move, I'm wondering how necessary it is to run hundreds of full-passes.

My main dilemma (haha) is this: If I tried to get the results out as fast as possible, I could get the video and the prize money out in a day or two! It would have to be an unedited, OBS-livestream style discussion. On the other hand, if I were to make a more polished video with more polished data, it might take months. (I envisioned displaying the data in a 3D environment with fancy transitions and animations to make the dataset feel like a new continent being explored. Also, I planned on calling every participant who did anything interesting, to include that in the video, but that would be a big logistical task!) I know many people would side with "quality over quantity" (meaning wait for a polished video), but I do think quick-n-easy argument is valid. It would suck to enter a tournament and not know the results for months, just so Cary could make his pretty scatter plot to pulsate... (like, does anybody even care if a graph pulsates, versus being a still image?) So, what do you guys think?

Also, if you're wondering why I posted this as a YouTube community post instead of a Github issue, my last GitHub issue only got 2 comments. So, in the interest of this post reaching as many participant's eyes as possible, I'm posting it here! (If I ever do future tournaments, I'll agree on a singular announcement location beforehand lol)

User670 commented 3 years ago

YouTube doesn't push all community posts to me so I didn't see it despite subscribed :-(

ThatXliner commented 3 years ago

Yeah, post GH Issues or make a new video, please!

Barigamb738 commented 3 years ago

Yeah, post GH Issues or make a new video, please!

Also, if you're wondering why I posted this as a YouTube community post instead of a Github issue, my last GitHub issue only got 2 comments. So, in the interest of this post reaching as many participant's eyes as possible, I'm posting it here!

nkrasner commented 3 years ago

Could you not run these "full-passes" in parallel? At any one time you only need 2 .py files open to run against eachother so it shouldn't cause any ram issues except (maybe) for storing the results matrix which might be streamable from a file so you don't have to hold the whole thing in ram at one time. You could probably run at least 10 instances at a time and be done within a week,

carykh commented 3 years ago

Could you not run these "full-passes" in parallel? At any one time you only need 2 .py files open to run against eachother so it shouldn't cause any ram issues except (maybe) for storing the results matrix which might be streamable from a file so you don't have to hold the whole thing in ram at one time. You could probably run at least 10 instances at a time and be done within a week,

Oh yeah, I probably could very well do that! I was worried that each full-pass was already taking up a ton of the CPU, but upon looking at Performance Manager, it doesn't seem to be taking up even half of it. (I also have 30 browser tabs open so I could close those.) So perhaps each night, I'll try to run 10 full-passes and see how it can handle it. If it's doing just fine, then yeah I'd have 100 passes done in 1.5 weeks! (Currently, I've only been doing 1 full-pass a night, so I have 5 full-passes done.)

image

I also saw somebody recommend that I do a quick-n-dirty results reveal on a livestream on secondary channel, and then a higher-quality actual analysis video on my main channel. I suppose I could do that! I could also call all the people who had interesting strats in-between the two videos. (Although, it might be fun to call them before any reveal, because then it's more of a blind reaction!)

nkrasner commented 3 years ago

I don't know that I would recommend that second idea because you only have 5% of the data. It's possible that the winner of this 5% is not the winner overall. While I have you here though, I wanted to ask, will you release the code when it's all over? You could possibly remove all comments programmatically in case anyone added their name or other info. I had a couple variations of mine but only sent the one that performed best against the sample strategies. I'd be curious to see if the others do better or worse against the full set.

carykh commented 3 years ago

I'm not sure if I'm allowed to release everybody's source code, because I never asked "Are you okay with your submission becoming open-source?" in the questionnaire. I should've added that question, but I didn't. I think 90% of people would probably be cool with having their code out in the open, but with 1,600 subsmissions I would be worried that I'd violate somebody's IP!

The one big upside of doing that is that the general public can verify my results, and prove that I'm not lying. Also, if I ever run a tournament like this again, people would have such a large pool of strats to compare against, the "arms race" will get more fierce! (I considered that a down-side, but some might see that as an upside.)

So, I'm still on the fence about whether I'll do a full release. (Also, don't worry, I wouldn't even make the quick-n-easy results-reveal until the leaderboard is finalized, whether that's 10 or 100 full-passes.)

donno2048 commented 3 years ago

I also saw somebody recommend that I do a quick-n-dirty results reveal on a livestream on secondary channel, and then a higher-quality actual analysis video on my main channel.

That sounds like the best option, I think most of the contestants would prefer to see a simpler one but to see it early than to wait for a polished version, while the regular viewers would prefer a polished version even if they must wait for it.

With that said, I'm fairly sure both the contestants and the viewers could watch both videos.

donno2048 commented 3 years ago

I'm not sure if I'm allowed to release everybody's source code

Why not just creating a new GitHub repo for this purpose and everyone who is cool with that can add his/her code?

ThatXliner commented 3 years ago

I'm not sure if I'm allowed to release everybody's source code

Why not just creating a new GitHub repo for this purpose and everyone who is cool with that can add his/her code?

There's always PrisonersDilemnaEnjoyers

donno2048 commented 3 years ago

There's always PrisonersDilemnaEnjoyers

Wait what?

nekiwo commented 3 years ago

There's always PrisonersDilemnaEnjoyers

Wait what?

we have a huge community fork and a discord server

EFHIII commented 3 years ago

Oh yeah, I probably could very well do that! I was worried that each full-pass was already taking up a ton of the CPU, but upon looking at Performance Manager, it doesn't seem to be taking up even half of it.

When it's single threaded, only 1 CPU core is being used and you have 4 cores on that CPU.

donno2048 commented 2 years ago

Closing as the event is over