Update from Cary (2021-06-15) (Long and not urgent, so you don't have to read it)

carykh commented 3 years ago

Hi everyone! I'm still running as many full 1477x1477 matrix runs per night as possible. Here's some quick facts:

The original plan was to do 1 full-matrix run, but people were concerned about the effect of randomness, so we agreed I should do 100 runs and then average the results.
1,468 of the 1,615 entries passed all sanity checks, plus the 9 defaults. That gets us to the competitor count of 1477.
Each game is ~250 moves on average, but I'm also not going to double-count (e.g. A+B and B+A). So, that's roughly 1477x1477x250x2/2 = 504 million function calls per full-matrix run.

Although I did implement multiprocessing, running 10 full-matrix runs actually uses up the CPU for 24 hrs 14 min, which means I can't use my computer to do other stuff. So lately, I've been limiting it to 3 full-matrix runs a night! With that, I've finished 25 full-matrix runs total. I told people I'd do 100 runs and average them, but recently, I've been thinking 30 should just be enough. Here's why:

the effect of randomness is proportional to the inverse square root of N, so going from 30 to 100 doesn't even halve it
I don't want to make you guys wait for another month.
The top 1 is clearly separable from the pack, but 2 and 3 are still close enough to swap, unfortunately
people generally say the Central Limit Theorem kicks in at n=30 (making margins of error easier to reason about)

I'm aware people have proposed other methods to speed up this process, such as caching repetitive games, cloud computing, and similar-strategy-detection. I tried implementing some of these, but none of them worked as well as I'd hoped. I could discuss why they didn't work if people want! But TL;DR if it takes 2 days to implement, but only saves you 1 day of runtime, it's not worth it lol

TL;DR: Are you guys cool with me averaging the results of 30 runs, and posting the results ~2 days from now, versus averaging 100 results about 25 days from now?

nekiwo commented 3 years ago

Yeah I think 30 is enough, I don't think the results at 100 runs will be groundbreaking

aaaa-trsh commented 3 years ago

yup!

Xapakas commented 3 years ago

Thanks for the update, and for all the effort you've put into this. I imagine 30 runs is enough to smooth out randomness and provide useful results, especially with such a large competitor pool. If you're concerned about crowning the 2nd and 3rd place winners correctly, maybe you could make the "quick-n-easy" video after 30 runs, and then run some more before the polished video comes out and the prize money is distributed?

arielsysdev commented 3 years ago

Yeah 30 runs sounds about right. No need to rush though if you need more time

Kinkelin commented 3 years ago

30 runs sounds good! You could maybe publish the min, max and median scores as well, so people could see how close it actually is.

donno2048 commented 3 years ago

30 runs will be fine

CathGorm commented 3 years ago

If you're concerned about randomness determining the top 3 spots: you could take, say, the top 15 that are currently in the running to win and run only them against all other entries. This would allow you to do 50 times more runs than the full 1468*1468 matrix.

Barigamb738 commented 3 years ago

I don't really have any interest in having more rounds since my strategy wouldn't win even if the effect of randomness was out of the picture. So i am fine with 30.

(P. S. I just realized how selfish that sounded, like i only care about my own strategy. But i didn't mean it in that way. Sorry to anyone who fealt offended by this.)

carykh commented 3 years ago

Thanks for the feedback, everybody! I'll go with 30 rounds then, so the calculations should be done some time tomorrow (2021-06-17).

Also, someone notified me of a Discord screenshot claiming that the results have already been revealed, and I just want to clarify that it is fake. As of right now (2021-06-16 12:14 PM PST), no results have been revealed yet!

If you're concerned about randomness determining the top 3 spots: you could take, say, the top 15 that are currently in the running to win and run only them against all other entries. This would allow you to do 50 times more runs than the full 1468*1468 matrix.

Also, CathGorm, that's a smart idea! In fact, there's probably only 10 or so contenders that have a possibility of being in the top 3 anyway, so I could run them many extra times at a much faster speed.

redtachyon2098 commented 3 years ago

Wait, so the results are almost out??? Excited noises

EFHIII commented 3 years ago

Also, CathGorm, that's a smart idea! In fact, there's probably only 10 or so contenders that have a possibility of being in the top 3 anyway, so I could run them many extra times at a much faster speed.

I bet you that if you take the top 10 contenders and run them against each other, it'd be a 10 way tie, with every match being always cooperate on both sides.

Somewhat counter-intuitively, the bad strategies are what determine the winner in this kind of competition, since chances are very high that the top several strategies (and with the pool size here, the number might be in the hundreds, but it's hard to speculate) won't defect first, which means that putting those "nice" strategies against other "nice" strategies always gives that same cooperate scenario.

The winner is determined by who does the best on average against all the not-nice strategies, which might be a small part of the actual pool of strategies. In a pool of only Nice strategies and random strategies, Grim Trigger would win, since Grim Trigger is the optimal Nice strategy against random. In a pool of only Nice strategies and Joss, Always Cooperate would win. In a mix, it just comes down to the strategy that's best optimized for that particular mix.

By removing the bad strategies, you remove the thing that determines the winner.

WillemPaternotte commented 3 years ago

By removing the bad strategies, you remove the thing that determines the winner.

I think the plan is to run the top strategies against all other strategies, but having the non top strats not compete against each other. You'll end up with a 10x1400 matrix instead of a 1400x1400 matrix.

EFHIII commented 3 years ago

You'll end up with a 10x1400 matrix instead of a 1400x1400 matrix.

That would work; I misread it at first which made me concerned; my bad.

carykh commented 3 years ago

Hey guys, I've finished calculating all the results! So I'll make the quick-n-easy results reveal in the next 6 hours or so.

Also, WillemPaternotte is correct. I needed to fine-tune the results of the top performers, because 2nd and 3rd scored close enough that they could flip.

So, I had the 1400x1400 matrix run 30 times, and the 3x1400 matrix (the top 3 vs. everyone else) run 200 times. It turns out 4th place was already >20 sigma below 3rd place, and took a while to run. So only the top 3 were necessary "fine-tune", and now that's done too!

arielsysdev commented 3 years ago

Can't wait! :D

donno2048 commented 3 years ago

Could you post here a link to the video when it's up?

duckboycool commented 3 years ago

Update: The video has been recorded now.

carykh commented 3 years ago

The video's out now! Unforutnately the results still can't be finalized yet, though: https://www.youtube.com/watch?v=yz7Db_yugfc

redtachyon2098 commented 3 years ago

So, when will the definitive results be announced?

karapapas commented 3 years ago

@carykh Hey Cary! Have you considered making a webapp of this project? I know it was part of a course, so you might don't want to dedicate more time to this, but I was thinking that it would be interesting.

People could improve their strategies and test against the others,
Everyone could have an account with a single or multiple strategies,
With options such as to reveal their code or not,
With automated cheating scanning methods for the submitted strategies that would improve overtime,
Dashboards with statistics, that would also be enriched by the time.

Of course this would need some kind of donation of computing resources from some cloud service, and some maintainers, but you never know, someone might be interested in helping with that.

Just a thought.

duckboycool commented 3 years ago

@carykh Hey Cary! Have you considered making a webapp of this project? I know it was part of a course, so you might don't want to dedicate more time to this, but I was thinking that it would be interesting.

People could improve their strategies and test against the others,

Everyone could have an account with a single or multiple strategies,

With options such as to reveal their code or not,

With automated cheating scanning methods for the submitted strategies that would improve overtime,

Dashboards with statistics, that would also be enriched by the time.

Of course this would need some kind of donation of computing resources from some cloud service, and some maintainers, but you never know, someone might be interested in helping with that.

Just a thought.

I think that that would be a lot of effort for a simple single time tournament, but Enjoyer's repo hosts its results on a GitHub pages with a GUI to help compare strats. It's definitely not on quite that scale, but it might be closer to what you're looking for.

carykh / PrisonersDilemmaTournament

Update from Cary (2021-06-15) (Long and not urgent, so you don't have to read it) #91