Closed vsraptor closed 4 years ago
Hi @vsraptor
The map is AxSxS. It is updated the same way as an SxS map would be, except that the chosen SxS map to be updated is based on the chosen action of the agent.
thanks, hmm.. got confused, cause S is first arg, but second index ;\
experiences.append([state, action, state_next, reward, done])
s = current_exp[0]
s_a = current_exp[1]
s_1 = current_exp[2]
td_error = (I + self.gamma * self.M[s_a_1, s_1, :] - self.M[s_a, s, :])
Why do you multiply the whole Z axis, shouldn't u do single cell ! or does it has to do something with the observation ! probably vector!
from what I understand you use StateAction x State map..? but you also have 3rd dimention ! Is it SxAxS or SxSxA ! what is the representation ?
Can you elaborate how do you manage and update SAxS map ? How does SAS scenario work ?
PS> From the article it seems it was about SxS map.