Closed dyedye closed 4 years ago
You can check the learn_from_batch
method in agents/policy_gradients_agent.py
, there we feed the actions to the policy head in the input dictionary by assigning the key output_0_0
.
It can be a little confusing, but when you create a placeholder inside the head, coach creates a name for it so you can feed values from the agent side, the pattern of the name is output_{index_of_the_head}_{index_of_the_placeholder_in_the_head}
, in the case of policy gradient agent there's only one head and one placeholder inside that head so both indices are 0.
Let me know if you have any more questions.
Shadi
Thanks a lot! It worked perfectly.
Thank you for launching very cool RL framework!
I'm currently trying to implement UNREAL and a pixel control head which receives actions taken by agents.
I defined placeholder for actions in _build_module method of head as follows
However, I can't figure out how to pass tensor to this placeholder. How can I feed values for this in Agent's get_prediction method or learn_from_batch method?
Any help would be appreciated.