Coach version : 1.0.0
Linux version : 18.04
Environment : Custom
Algorithm : Clipped PPO
I'm using a custom environment to train a basic_rl_graph with the clipped PPO algorithm. During the heat up, i get and error in the init of a Transition object. Here is the full Traceback :
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/sebastien/.local/lib/python3.6/site-packages/rl_agent/local_training_worker.py", line 57, in <module>
main()
File "/home/sebastien/.local/lib/python3.6/site-packages/rl_agent/local_training_worker.py", line 51, in main
start_graph(graph_manager=graph_manager, task_parameters=task_parameters)
File "/home/sebastien/.local/lib/python3.6/site-packages/rl_agent/local_training_worker.py", line 28, in start_graph
graph_manager.improve()
File "/home/sebastien/.local/lib/python3.6/site-packages/rl_coach/graph_managers/graph_manager.py", line 539, in improve
self.heatup(self.heatup_steps)
File "/home/sebastien/.local/lib/python3.6/site-packages/rl_coach/graph_managers/graph_manager.py", line 389, in heatup
self.act(EnvironmentEpisodes(1))
File "/home/sebastien/.local/lib/python3.6/site-packages/rl_coach/graph_managers/graph_manager.py", line 449, in act
result = self.top_level_manager.step(None)
File "/home/sebastien/.local/lib/python3.6/site-packages/rl_coach/level_manager.py", line 239, in step
done = acting_agent.observe(env_response)
File "/home/sebastien/.local/lib/python3.6/site-packages/rl_coach/agents/agent.py", line 927, in observe
game_over=filtered_env_response.game_over, info=filtered_env_response.info)
File "/home/sebastien/.local/lib/python3.6/site-packages/rl_coach/core_types.py", line 214, in __init__
if not next_state:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
And here is the terminal output of print(self.last_env_response) called in the _update_state() method of the environment just before the error occur:
As I understand it, the next_state is supposed to be the state array, so the line if not next_state: in the Transition constructor is in fact problematic. Did I miss understood the definition of states ?
Feel free to ask for more information. Thanks a lot.
Here are a few code snippets for further information. My preset file:
Coach version : 1.0.0 Linux version : 18.04 Environment : Custom Algorithm : Clipped PPO
I'm using a custom environment to train a basic_rl_graph with the clipped PPO algorithm. During the heat up, i get and error in the init of a Transition object. Here is the full Traceback :
And here is the terminal output of
print(self.last_env_response)
called in the_update_state()
method of the environment just before the error occur:As I understand it, the next_state is supposed to be the state array, so the line
if not next_state:
in the Transition constructor is in fact problematic. Did I miss understood the definition of states ?Feel free to ask for more information. Thanks a lot.
Here are a few code snippets for further information. My preset file:
The custom environment (just the high level code ):