memory leak in train loop?

I've noticed memory use grows with time in a coach learning process. This matters because RL training can take a long time and lower memory usage can reduce learning costs.

I instrumented the agent train loop with tracemalloc and looked at the top memory allocators:

[<StatisticDiff traceback=<Traceback (<Frame filename='.../lib/python3.6/tracemalloc.py' lineno=123>,)> size=10454240 (+5227308) count=130690 (+65355)>,                                                                                                                                       
 <StatisticDiff traceback=<Traceback (<Frame filename='.../lib/python3.6/tracemalloc.py' lineno=462>,)> size=6771608 (+3628608) count=141067 (+75596)>,                                                                                                                                        
 <StatisticDiff traceback=<Traceback (<Frame filename='.../coach/rl_coach/core_types.py' lineno=250>,)> size=8879952 (+2959984) count=29997 (+9999)>,                                                                                                                                                          
 <StatisticDiff traceback=<Traceback (<Frame filename='.../lib/python3.6/site-packages/tensorflow/python/client/session.py' lineno=346>,)> size=7193760 (+2397864) count=59932 (+19977)>,                                                                                                      
 <StatisticDiff traceback=<Traceback (<Frame filename='.../lib/python3.6/copy.py' lineno=161>,)> size=5522440 (+1840120) count=90051 (+30005)>,                                                                                                                                                
 <StatisticDiff traceback=<Traceback (<Frame filename='.../lib/python3.6/site-packages/tensorflow/python/framework/ops.py' lineno=5228>,)> size=5080408 (+1680000) count=30365 (+10000)>,                                                                                                      
 <StatisticDiff traceback=<Traceback (<Frame filename='.../lib/python3.6/site-packages/tensorflow/python/client/session.py' lineno=296>,)> size=4311216 (+1436904) count=59878 (+19957)>,

Ignoring the tracemalloc ones we see the top allocator is in Transition.add_info, traceback

  File ".../coach/rl_coach/core_types.py", line 250
    self.info.update(new_info)
  File ".../coach/rl_coach/agents/agent.py", line 870
    transition.add_info(self.last_action_info.__dict__)
  File ".../coach/rl_coach/level_manager.py", line 219
    done = acting_agent.observe(env_response)
  File ".../coach/rl_coach/graph_managers/graph_manager.py", line 443
    result = self.top_level_manager.step(None)
  File ".../coach/rl_coach/graph_managers/graph_manager.py", line 476
    self.act(EnvironmentSteps(1))

This seems like a leak, since transition objects should be free for gc?

Any advice here? I can't reason what might be keeping a reference to the transition.

IntelLabs / coach

memory leak in train loop? #277