Closed dvanbrug closed 6 months ago
Hi @dvanbrug, Thanks for finding this. I'll do some investigating and see if it holds true for the wrapped environment with different agents. For Cage Challenge 4, we probably won't change this as it's a huge modification to the original environment making comparing agents after the challenge impossible. However, we will update the documentation to reflect this scenario. Again, thanks a bunch for a finding this.
The Simulation controller is not correctly tracking reward for blue agents it instantiates. Specifically, Blue action costs are not tracked for any Blue agents managed by CybORG itself. Since this only affects Blue agents that are setup by CybORG and the Restore action is the only action with a cost, this should only impact situations where default Blue agents are used that take the Restore action.
For example, if we use the following
ConstantAgent
that issues a restore every time step, we do not see any negative reward attributed to Blue agents.