Revert `bg_multi_mini` reward balancing changes

PathmindAI / nativerl

Train reinforcement learning agents using AnyLogic or Python-based simulations

Apache License 2.0

19 stars 4 forks source link

Revert `bg_multi_mini` reward balancing changes #513

Closed slinlee closed 2 years ago

slinlee commented 2 years ago

Including bg_multi_mini's reward balancing changes made training with reward functions perform worse: https://test.devpathmind.com/sharedExperiment/7273

And using reward terms trains, but also has poor results https://test.devpathmind.com/sharedExperiment/7274

The throughputs are expected to be 60+ but they're in the 10-30s.

slinlee commented 2 years ago

I am reopening #509 so that we can test it.

slinlee commented 2 years ago

This shouldn't be merged yet. https://test.devpathmind.com/experiment/7277 shows that training with reward terms needs another operator:

    base_env.send_actions(actions_to_send)
  File "/app/conda/lib/python3.7/site-packages/ray/rllib/env/base_env.py", line 421, in send_actions
    obs, rewards, dones, infos = env.step(agent_dict)
  File "/app/work/pathmind_training/environments.py", line 281, in step
    self.term_contributions_dict.values()
TypeError: unsupported operand type(s) for +: 'int' and 'nativerl.Array'

maxpumperla commented 2 years ago

@slinlee can you please post the full stack trace? I want to understand which line fails

slinlee commented 2 years ago

@slinlee can you please post the full stack trace? I want to understand which line fails

@maxpumperla of course, here you go https://gist.github.com/slinlee/0511f0e981e47d6261cf3249c3aa6bcf