cage-challenge / cage-challenge-4

The TTCP CAGE Challenges are a series of public challenges instigated to foster the development of autonomous cyber defensive agents. This CAGE Challenge 4 (CC4) returns to a defence industry enterprise environment, and introduces a Multi-Agent Reinforcement Learning (MARL) scenario.
https://cage-challenge.github.io/cage-challenge-4/
Other
41 stars 10 forks source link

Awards are not consistently tracked for CybORG managed agents #40

Closed dvanbrug closed 6 months ago

dvanbrug commented 7 months ago

The Simulation controller is not correctly tracking reward for blue agents it instantiates. Specifically, Blue action costs are not tracked for any Blue agents managed by CybORG itself. Since this only affects Blue agents that are setup by CybORG and the Restore action is the only action with a cost, this should only impact situations where default Blue agents are used that take the Restore action.

For example, if we use the following ConstantAgent that issues a restore every time step, we do not see any negative reward attributed to Blue agents.

class RestoreAgent(ConstantAgent):
    """A constant agent whose fixed action is Sleep."""
    def __init__(self, name=None, **kwargs):
        agent_host_pairs = {
            'blue_agent_0': 'restricted_zone_a_subnet_user_host_0',
            'blue_agent_1': 'operational_zone_a_subnet_user_host_0',
            'blue_agent_2': 'restricted_zone_b_subnet_user_host_0',
            'blue_agent_3': 'operational_zone_b_subnet_user_host_1',
            'blue_agent_4': 'public_access_zone_subnet_user_host_0',

        }
        action = Restore(0, name, agent_host_pairs[name])
        super().__init__(action, name)

sg = EnterpriseScenarioGenerator(
    blue_agent_class=RestoreAgent,
    steps=500,
)

cyborg = CybORG(sg)

obs, rew, done, info = cyborg.parallel_step({})
blue_rew = {a:r for a,r in rew.items() if 'blue' in a}
print(f"Blue Rewards: {blue_rew}")
cage-challenge commented 6 months ago

Hi @dvanbrug, Thanks for finding this. I'll do some investigating and see if it holds true for the wrapped environment with different agents. For Cage Challenge 4, we probably won't change this as it's a huge modification to the original environment making comparing agents after the challenge impossible. However, we will update the documentation to reflect this scenario. Again, thanks a bunch for a finding this.