The following definition is present in the MDP class:
def T(self,state, action):
"""Transition model. From a state and an action, return a list
of (result-state, probability) pairs."""
the value iteration procedure, though, expects the transition model to return a
list of (probability,result-state) pairs. I haven't looked at what other
algorithms might be expecting.
Original issue reported on code.google.com by Gustavo....@gmail.com on 21 Apr 2011 at 6:26
Original issue reported on code.google.com by
Gustavo....@gmail.com
on 21 Apr 2011 at 6:26