ARCC-RACE / deepracer-for-dummies

a quick way to get up and running with local deepracer training environment
66 stars 28 forks source link

program crashes after first 20 episodes! #36

Closed sushil-bharati closed 5 years ago

sushil-bharati commented 5 years ago

LOG:

/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py:2957: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/usr/local/lib/python3.6/dist-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
{"simapp_exception": {"version": "1.0", "date": "2019-08-12 00:45:43.968140", "function": "training_worker", "message": "An error occured while training: invalid index to scalar variable.. Job failed!.", "exceptionType": "training_worker.exceptions", "eventType": "system_error", "errorCode": "503"}}
Traceback (most recent call last):
  File "training_worker.py", line 91, in training_worker
    graph_manager.train()
  File "/usr/local/lib/python3.6/dist-packages/rl_coach/graph_managers/graph_manager.py", line 400, in train
    [manager.train() for manager in self.level_managers]
  File "/usr/local/lib/python3.6/dist-packages/rl_coach/graph_managers/graph_manager.py", line 400, in <listcomp>
    [manager.train() for manager in self.level_managers]
  File "/usr/local/lib/python3.6/dist-packages/rl_coach/level_manager.py", line 174, in train
    [agent.train() for agent in self.agents.values()]
  File "/usr/local/lib/python3.6/dist-packages/rl_coach/level_manager.py", line 174, in <listcomp>
    [agent.train() for agent in self.agents.values()]
  File "/usr/local/lib/python3.6/dist-packages/rl_coach/agents/clipped_ppo_agent.py", line 317, in train
    self.train_network(batch, self.ap.algorithm.optimization_epochs)
  File "/usr/local/lib/python3.6/dist-packages/rl_coach/agents/clipped_ppo_agent.py", line 266, in train_network
    self.value_loss.add_sample(batch_results['losses'][0])
IndexError: invalid index to scalar variable.
sushil-bharati commented 5 years ago

Unknown solution but due to batch size being too massive!