SanchoGGP / ggp-base

The General Game Playing Base Package
8 stars 4 forks source link

Take another look at dynamic sample rate #80

Open arr28 opened 10 years ago

arr28 commented 10 years ago

We seem to use a sample rate that's a bit too high, at least some of the time and perhaps specifically jumping from 1 to 2 when we shouldn't.

Consider taking into account the amount of time that the search thread spends blocked trying to get a free pipeline slot.

arr28 commented 10 years ago

This causes jaggy performance and potentially excessively high sample size in Four-Player Tic-Tac-Chess and Three-Player Chinese Checkers.

Also need to investigate why the gentle ramp-up in sample size suppresses expansion rate in Qyshinsu matches as part of this.

SteveDraper commented 10 years ago

Example form Tiltyard of 6 player CC, that displayed really pathological sample size issues: http://www.ggp.org/view/tiltyard/matches/1d2d46d490b46365ea6f62b3794b993fecdcf1df/

arr28 commented 10 years ago

This is interesting.

I think there are some games which just have wildly varying performance, even for constant sample size. If you look at the various examples above, you can find places where the sample size is steady but the number of iterations per second varies wildly.

One common pattern seems to be that, when we are "surprised" by the chosen move (evidenced by greater than usual drop in node pool usage), that causes a very significant drop in node expansion rate for a while. The latest linked game is a great example of this. I have no hypothesis for why "surprises" have such a dramatic effect on the ensuing node expansion rate. Ooo, actually, perhaps I do. Let's imagine that the branching factor is fairly low and that we've expanded many levels of the tree down the expected branches. This makes selection expensive (lots of nodes to evaluate) and rollouts cheap (not many moves from the newly expanded node to the end of the game). This should be met with a large sample size. However, on a mis-predict, we throw virtually the whole tree away. Suddenly, selection is cheap (few nodes to examine) and rollout is expensive. But we're stuck with the old sample size and it takes some time to react.

This isn't the only pattern. Sometimes node expansion rate is just naturally volatile. I wonder if these games have higher than usual variance in rollout length (or even extreme variation in number of legal moves) and that, in particular, some areas of the tree have much less depth than others. That would lead to lowering the node expansion rate when we moved from an area with short rollouts to an area with larger rollouts (because it takes time to adjust the sample rate to compensate).

All the above is highly speculative. Next step is probably to add more stats to try to confirm the ideas above.

Steve - I'm definitely interested in any observations or speculations you might care to make.

SteveDraper commented 10 years ago

I'll have a think about it, but as you say adding appropriate stats certainly seems like the best response for now. Also is it symmetrical how quickly sample rate can rise or fall (percentage per sample period wise that is)? If it can rise more quickly than it can then fall back afterwards, that might be a problem in coping with high variance situations as it would tend to introduce a bias.

On another matter - I have made a real breakthrough with ELB. Essentially 100% win rate of new vs old, and a massive qualitative difference in play when you watch (I'd say we now play ELB as well as, say Greenshell, plays regular Breakthrough). The trick was to incorporate the use of goal latches into the greedy rollouts (essentially prevents rollouts where the king saunters past pawns that should capture it). Because of the nature of the pre-analysed latches this is very cheap to do and amazingly effective. I have a bunch of regression testing to do before I push the changes, but if I haven't made a mistake it should only have any functional effect in games with negative goal latches, so there shouldn't be much scope for it to actually modify any play in anything but ELB really.

Steve

On Sun, Jul 13, 2014 at 4:58 PM, Andrew Rose notifications@github.com wrote:

This is interesting.

I think there are some games which just have wildly varying performance, even for constant sample size. If you look at the various examples above, you can find places where the sample size is steady but the number of iterations per second varies wildly.

One common pattern seems to be that, when we are "surprised" by the chosen move (evidenced by greater than usual drop in node pool usage), that causes a very significant drop in node expansion rate for a while. The latest linked game http://www.ggp.org/view/tiltyard/matches/1d2d46d490b46365ea6f62b3794b993fecdcf1df/ is a great example of this. I have no hypothesis for why "surprises" have such a dramatic effect on the ensuing node expansion rate. Ooo, actually, perhaps I do. Let's imagine that the branching factor is fairly low and that we've expanded many levels of the tree down the expected branches. This makes selection expensive (lots of nodes to evaluate) and rollouts cheap (not many moves from the newly expanded node to the end of the game). This should be met with a large sample size. However, on a mis-predict, we throw virtually the whole tree away. Suddenly, selection is cheap (few nodes to examine) and rollout is expensive. But we're stuck w ith the old sample size and it takes some time to react.

This isn't the only pattern. Sometimes node expansion rate is just naturally volatile. I wonder if these games have higher than usual variance in rollout length (or even extreme variation in number of legal moves) and that, in particular, some areas of the tree have much less depth than others. That would lead to lowering the node expansion rate when we moved from an area with short rollouts to an area with larger rollouts (because it takes time to adjust the sample rate to compensate).

All the above is highly speculative. Next step is probably to add more stats to try to confirm the ideas above.

Steve - I'm definitely interested in any observations or speculations you might care to make.

— Reply to this email directly or view it on GitHub https://github.com/SteveDraper/ggp-base/issues/80#issuecomment-48853000.

arr28 commented 10 years ago

The new graphs are perplexing. I record the time taken to do the following things...

...then produce a stacked % graph.

I played some matches of 4PTTC with Sancho and 3 x Random. Here are the results...

image

The amount of time spent in the pipeline ("Queue") bothered me. I thought there was something up with the handover from tree thread to rollout thread, so I poked around in the code a bit. (I no longer think this is a significant factor, but here are the results anyway.)

image

image

image

image

The two matches with sample size fixed at 200 (which it really is, despite being graphed as 1), clearly show what the asymptotic behaviour is. But it isn't what I was really expecting. I need to sit down with a clear head and figure out what's going on.

With regular pipeline size, the raw (unstacked) values are 3-4% for "Get Slot", 84% for time spent on the queue and 11-12% rollout.

With the short pipeline, the correspond values are 14%, 43% & 42%. (A pipeline size of 2 per thread means 1 waiting and 1 being processed - i.e. the rollout that is being performed counts against the pipeline limit.)

From these two data points, it looks like...

Get_Slot_Time = 100% / Total_Pipeline_Size

Remaining_Time = 100% - Get_Slot_Time
Rollout_Time = Remaining_Time x (1 / Per_Thread_Pipeline_Size)
Queue_Time = Remaining_Time x ((Per_Thread_Pipeline_Size - 1) / Per_Thread_Pipeline_Size)

The latter 3 are self-evident. I haven't convinced myself why the first one is true yet.

Given all that, are the better numbers to graph for understanding sample size variation?

SteveDraper commented 10 years ago

Well the first thing to point out is that this is all (?) elapsed time, which is very different from CPU-time. Time spent sitting in the queue can be large without impacting overall throughput (high latency here has other detrimental impacts, but not to throughput). Consequently I'm not sure that the time-in-queues part has a direct bearing on the sample sizing, so while it is potentially an interesting stat, I don't think it's directly relevant.

Obviously the MCTS expansion rate with the fixed sample sizes is more consistent (but fixing it that high only shows you one asymptote - might be interesting to see what fixed at 1 looks like too). It may be that we just need a bit of hysteresis applying to the way the sample rate is moved.

On Tue, Jul 15, 2014 at 5:02 PM, Andrew Rose notifications@github.com wrote:

The new graphs are perplexing. I record the time taken to do the following things...

  • Select
  • Expand
  • Get a free pipeline slot
  • Sit in the pipeline
  • Rollout
  • Sit in the pipeline again
  • Back-prop

...then produce a stacked % graph.

I played some matches of 4PTTC with Sancho and 3 x Random. Here are the results...

[image: image] https://cloud.githubusercontent.com/assets/3194913/3591823/c13faafe-0c68-11e4-9061-07848b68321b.png

The amount of time spent in the pipeline ("Queue") bothered me. I thought there was something up with the handover from tree thread to rollout thread, so I poked around in the code a bit. (I no longer think this is a significant factor, but here are the results anyway.)

[image: image] https://cloud.githubusercontent.com/assets/3194913/3591838/e7d29898-0c68-11e4-83f2-4064cb492889.png

[image: image] https://cloud.githubusercontent.com/assets/3194913/3591849/1d4319c6-0c69-11e4-8a0c-caafe8a39b2c.png

[image: image] https://cloud.githubusercontent.com/assets/3194913/3591854/3e4c6866-0c69-11e4-80bd-1efdc9668fe5.png

[image: image] https://cloud.githubusercontent.com/assets/3194913/3591870/82ba9964-0c69-11e4-8413-c7674b42e43b.png

The two matches with sample size fixed at 200 (which it really is, despite being graphed as 1), clearly show what the asymptotic behaviour is. But it isn't what I was really expecting. I need to sit down with a clear head and figure out what's going on.

With regular pipeline size, the raw (unstacked) values are 3-4% for "Get Slot", 84% for time spent on the queue and 11-12% rollout.

With the short pipeline, the correspond values are 14%, 43% & 42%. (A pipeline size of 2 per thread means 1 waiting and 1 being processed - i.e. the rollout that is being performed counts against the pipeline limit.)

From these two data points, it looks like...

Get_Slot_Time = 100% / Total_Pipeline_Size

Remaining_Time = 100% - Get_Slot_Time Rollout_Time = Remaining_Time x (1 / Per_Thread_Pipeline_Size) Queue_Time = Remaining_Time x ((Per_Thread_Pipeline_Size - 1) / Per_Thread_Pipeline_Size)

The latter 3 are self-evident. I haven't convinced myself why the first one is true yet.

Given all that, are the better numbers to graph for understanding sample size variation?

— Reply to this email directly or view it on GitHub https://github.com/SteveDraper/ggp-base/issues/80#issuecomment-49099066.

arr28 commented 10 years ago

I realised that time spent on the queue isn't an issue, but hadn't thought about how it was skewing the graphs. I'll take it out.

With regards to asymptotes, here's an interesting graph from the qualifiers today which possibly shows the other side of things. This was the game of C4. In move 12 (for which we were given an unusually large amount of time), we became turbo-charged (63K iterations/s) and the graph was qualitatively different.

image

Oddly, it doesn't seem related to sample rate, which merrily increases throughout the turbo period without causing an issue.

All graphs and logs available locally.

arr28 commented 10 years ago

Untagging 2014-candidate. (Too late.)

SteveDraper commented 10 years ago

Here's another loss that the charts suggest might be influenced by this issue: http://www.ggp.org/view/tiltyard/matches/0e83513c6c1e8ebf1e1d12f97aff5b4a1ffcb391/

This might make a decent regression test at about move 6.

arr28 commented 9 years ago

Consider #225 an an example when fixing this.

arr28 commented 9 years ago

Less important since we nearly always use RAVE and therefore fix the sample rate at 1. Reducing to P3 and untagging for IGGPC15.