Closed redknightlois closed 5 years ago
Disregard the reason, I didn't see the discounted reward being correctly applied. Still, the change is good to diminish the CPU consumption of the BasicRLGraphManager. In the scenario I am playing around, I can barely keep the GPU at higher than 5%
I'm sorry, but I'm not following on what exactly is the issue at hand.Could you please provide a reproducing example?
The line that was referenced is asking for a single EnvironmentStep
at the BasicRLGraphManager
level, in order to allow for agents in both different hierarchy levels (and even on the same level) to expand on this to their desired num_consecutive_playing_steps
. So for instance we might have one agent wanting to play 4 steps between each train period (e.g. DQN), and other agent asking to play just a single step, before the next training phase. This line allows to do both, by allowing each agent to actually decide on its personal number of playing steps.
If you suspect an issue here, please provide a reproducing example.
Will graph manager be renamed/transitioned to the so called Block Factory ?
Closing this due to no reproduction. Please reopen if needed.
On cases with sparse rewards, the transitions always get 0 reward because the information is essentially not known because the graph manager does not accumulate for the entire episode.
https://github.com/NervanaSystems/coach/blob/master/rl_coach/graph_managers/graph_manager.py#L483
This method should be modified to read instead:
It would also be interesting that we could configure the first parameter too.