Following you example , the code keeps throwing error
1 agent = Q_Learner(env)
----> 2 learned_policy = train(agent, env)
<string> in train(agent, env)
<string> in learn(self, obs, action, reward, next_obs)
IndexError: too many indices for array
in the line
td_target = reward + self.gamma * np.max(self.Q[discretized_next_obs])
Following you example , the code keeps throwing error
in the line
td_target = reward + self.gamma * np.max(self.Q[discretized_next_obs])