PathmindAI / nativerl

Train reinforcement learning agents using AnyLogic or Python-based simulations
Apache License 2.0
19 stars 4 forks source link

bug calculating reward `TypeError: unsupported operand type(s) for *: 'dict_values' and 'float'` #489

Closed slinlee closed 2 years ago

slinlee commented 2 years ago
2251
    self.vector_env.vector_step(action_vector)
2252
  File "/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/ray/rllib/env/vector_env.py", line 167, in vector_step
2253
    obs, r, done, info = self.envs[i].step(actions[i])
2254
  File "/home/runner/work/nativerl/nativerl/nativerl/python/pathmind_training/environments.py", line 303, in step
2255
    reward = np.sum(reward_array * self.alphas * self.betas)
2256
TypeError: unsupported operand type(s) for *: 'dict_values' and 'float'
maxpumperla commented 2 years ago

@slinlee I think it comes from this line:

https://github.com/SkymindIO/nativerl/blob/dev/nativerl/python/pathmind_training/environments.py#L455

there should be a list like in line 445 above it.

slinlee commented 2 years ago

@maxpumperla cool. I added that in.