FragileTech / FractalAI

Cellular automaton-based calculus for the masses
https://github.com/FragileTech
GNU Affero General Public License v3.0
68 stars 14 forks source link

Question about nature of Fractal AI #82

Closed hoyleb closed 6 years ago

hoyleb commented 6 years ago

Dear team, Thanks for making this public. I have had fun testing it. I would to ask a question, which I am having trouble finding the answer to in the code and documentation.

Please update my understanding if this statement is incorrect. It appears to me that Fractal AI is a look ahead algorithm, that performs many tree-like searches for a few "time" steps, and then takes the best step, based on the results from each of the walkers. Of course there is lots of fancy stuff going on under the hood that I am glossing over, but is this the basic gist of the algorithm?

Thanks a lot.

Ben

FYI, I'm getting amazing scores on open.ai gym pacman, that I'm trying to understand.

sergio-hcsoft commented 6 years ago

You are right, it is a look ahead algorithm, it builds a tree of a given deepth to choose the actions, as in any other Monte Carlo. The way how this tree grows is where the magic happens.

sergio-hcsoft commented 6 years ago

The right term was "Planning algorithm"... we have added it here and there to avoid confusion.

ghost commented 5 years ago

Isn't that cheating? You are exploiting the emulator, current deep learning approaches use the emulator for convenience but you utterly rely on it.

It seems to me to just be MCTS with a simple heuristic.

sergio-hcsoft commented 5 years ago

Well, any planning algorithm (including the MCTS inside AlphaZero) will do exactly that: assume you can predict the next state of your system (someway, with some confidence and with some accuracy) and then use it to build paths, exactly as we do.

In fact, when MCTS or IW(1) are tested against Atari games, a simulator is used to predict the next states, so you are free to think all planning algorithm cheat if you wish, but then you are out of RL, even old chess programs will be cheaters for you!

In the general case, the perfect "simulator" is replaced by an NN that learns to approximately predict the next state (from an initial state plus an action), but to measure of how good or bad a planning algorithm is, you need to stick to an environment you can fully simulate, otherwise you couldn't tell if bad results came from the imperfect predictions coming from your NN or from the planning algorithm itself.