Closed hoyleb closed 6 years ago
You are right, it is a look ahead algorithm, it builds a tree of a given deepth to choose the actions, as in any other Monte Carlo. The way how this tree grows is where the magic happens.
The right term was "Planning algorithm"... we have added it here and there to avoid confusion.
Isn't that cheating? You are exploiting the emulator, current deep learning approaches use the emulator for convenience but you utterly rely on it.
It seems to me to just be MCTS with a simple heuristic.
Well, any planning algorithm (including the MCTS inside AlphaZero) will do exactly that: assume you can predict the next state of your system (someway, with some confidence and with some accuracy) and then use it to build paths, exactly as we do.
In fact, when MCTS or IW(1) are tested against Atari games, a simulator is used to predict the next states, so you are free to think all planning algorithm cheat if you wish, but then you are out of RL, even old chess programs will be cheaters for you!
In the general case, the perfect "simulator" is replaced by an NN that learns to approximately predict the next state (from an initial state plus an action), but to measure of how good or bad a planning algorithm is, you need to stick to an environment you can fully simulate, otherwise you couldn't tell if bad results came from the imperfect predictions coming from your NN or from the planning algorithm itself.
Dear team, Thanks for making this public. I have had fun testing it. I would to ask a question, which I am having trouble finding the answer to in the code and documentation.
Please update my understanding if this statement is incorrect. It appears to me that Fractal AI is a look ahead algorithm, that performs many tree-like searches for a few "time" steps, and then takes the best step, based on the results from each of the walkers. Of course there is lots of fancy stuff going on under the hood that I am glossing over, but is this the basic gist of the algorithm?
Thanks a lot.
Ben
FYI, I'm getting amazing scores on open.ai gym pacman, that I'm trying to understand.