stochastic transitions for tree search agents

saArbabi commented 4 years ago

Eleurent, thanks for developing this great project and sharing it.

To my understanding, currently the MCTS agent deterministically transitions to new states during the planning phase. I was wondering what class you would modify for considering stochastic transitions? For instance, in case gaussian noise was added to the actions executed by other agents.

Thank you in advance

eleurent commented 4 years ago

Hello @saArbabi This was the case until very recently (see #43), now the UCT algorithm ("mcts agent") uses a different random seed for each rollout during the planning phase (see this line).

Now, note that there are two ways in which stochastic transitions can be handled:

Open-loop: policies are described by sequences of actions, independently of where the (random) state ends up. The tree is constituted of max nodes only). Example: UCT (with closed_loop config = false) and OLOP algorithms
Closed-loop: policies are conditioned by the successor state reached after each action: the tree constituted of both maxand average nodes. Example: UCT (with closed_loop = true) and MDPGapE algorithms.

Open-loop algorithms are sub-optimal compared to closed loop algorithms. However, closed-loop planning algorithms implemented here only work when the support of the transition distribution is finite. Indeed, a new node is attributed for every random state encountered, and serves as the root of the resulting planning subtree. But with e.g. Gaussian noise, the next state will never be encountered twice, which means that the algorithm is going to keep creating new nodes with a single visit and won't be able to explore/exploit.

I am not familiar with any tree-based planning algorithm that handles stochastic transitions with infinite support (e.g. Gaussian). This would require an ability to aggregate together similar next states, based on some good criteria. I think that two approaches can be considered for you:

using an open-loop planner (then, all next-states of a transition are aggregated together).
discretising the state space (then, you get a finite state space and thus a finite transition support, so closed-loop planning algorithms can be used)

Does that help?

saArbabi commented 4 years ago

Thanks @eleurent for the quick response!

I need to spend some time digging dipper into the code, to ensure I am fully understanding your suggestion. Having done some research, I know that for infinite support (e.g. actions perturbed by Gaussian noise), UCT with progressive widening (PW) is used. PW is also used for handling continuous observations in case of POMDPs. If I make any useful progress I will for sure share it with you/make a pull request. For now I will close the issue.

eleurent / rl-agents

stochastic transitions for tree search agents #44