IntelLabs / coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
https://intellabs.github.io/coach/
Apache License 2.0
2.32k stars 459 forks source link

BasicRLGraphManager should encourage finish an episode #259

Closed redknightlois closed 5 years ago

redknightlois commented 5 years ago

On cases with sparse rewards, the transitions always get 0 reward because the information is essentially not known because the graph manager does not accumulate for the entire episode.

https://github.com/NervanaSystems/coach/blob/master/rl_coach/graph_managers/graph_manager.py#L483

This method should be modified to read instead:

self.act(EnvironmentSteps(1), wait_for_full_episodes=self.agent_params.algorithm.act_for_full_episodes)

It would also be interesting that we could configure the first parameter too.

redknightlois commented 5 years ago

Disregard the reason, I didn't see the discounted reward being correctly applied. Still, the change is good to diminish the CPU consumption of the BasicRLGraphManager. In the scenario I am playing around, I can barely keep the GPU at higher than 5%

gal-leibovich commented 5 years ago

I'm sorry, but I'm not following on what exactly is the issue at hand.Could you please provide a reproducing example?

The line that was referenced is asking for a single EnvironmentStep at the BasicRLGraphManager level, in order to allow for agents in both different hierarchy levels (and even on the same level) to expand on this to their desired num_consecutive_playing_steps. So for instance we might have one agent wanting to play 4 steps between each train period (e.g. DQN), and other agent asking to play just a single step, before the next training phase. This line allows to do both, by allowing each agent to actually decide on its personal number of playing steps.

If you suspect an issue here, please provide a reproducing example.

jamartinh commented 5 years ago

Will graph manager be renamed/transitioned to the so called Block Factory ?

galnov commented 5 years ago

Closing this due to no reproduction. Please reopen if needed.