Awards are not consistently tracked for CybORG managed agents

The Simulation controller is not correctly tracking reward for blue agents it instantiates. Specifically, Blue action costs are not tracked for any Blue agents managed by CybORG itself. Since this only affects Blue agents that are setup by CybORG and the Restore action is the only action with a cost, this should only impact situations where default Blue agents are used that take the Restore action.

For example, if we use the following ConstantAgent that issues a restore every time step, we do not see any negative reward attributed to Blue agents.

class RestoreAgent(ConstantAgent):
    """A constant agent whose fixed action is Sleep."""
    def __init__(self, name=None, **kwargs):
        agent_host_pairs = {
            'blue_agent_0': 'restricted_zone_a_subnet_user_host_0',
            'blue_agent_1': 'operational_zone_a_subnet_user_host_0',
            'blue_agent_2': 'restricted_zone_b_subnet_user_host_0',
            'blue_agent_3': 'operational_zone_b_subnet_user_host_1',
            'blue_agent_4': 'public_access_zone_subnet_user_host_0',

        }
        action = Restore(0, name, agent_host_pairs[name])
        super().__init__(action, name)

sg = EnterpriseScenarioGenerator(
    blue_agent_class=RestoreAgent,
    steps=500,
)

cyborg = CybORG(sg)

obs, rew, done, info = cyborg.parallel_step({})
blue_rew = {a:r for a,r in rew.items() if 'blue' in a}
print(f"Blue Rewards: {blue_rew}")

cage-challenge / cage-challenge-4

Awards are not consistently tracked for CybORG managed agents #40