Closed thetwotravelers closed 2 years ago
I'm going to test this really quickly on test.devpathmind.com This changes the lines where the previous bug was showing up too.
Including bg_multi_mini's reward balancing changes made training with reward functions perform worse: https://test.devpathmind.com/sharedExperiment/7273
And using reward terms trains, but also has poor results https://test.devpathmind.com/sharedExperiment/7274
The throughputs are expected to be 60+ but they're in the 10-30s.
Small piece of bg_nb that begins to address reward balancing for multiagent models like Felipe's.