Closed soledo closed 1 month ago
The comment in the dynamic_blue.py file looks like it may just be a typo, there isn't a file currently referenced called blue_agent_config.yaml. The file used by default is the dynamic_blue_agent.yaml file, and it currently has the actions for deploying a decoy, and for doing nothing. However, you can add more actions that are defined in blue_actions/actions. For example, to add the remove action to the blue agent, you can add this code in the actions section of the YAML file:
remove_decoy: # Unique name for the action
module: RemoveDecoyHost # Name of the module of the blue action
class: RemoveDecoyHost # Name of the class of the blue action
configs: # Defines configs if applicable, none are needed for the remove action.
reward: # Defines rewards
immediate: 0 # no short-term cost or benefit to removing decoys
recurring: 0 # no long-term cost or benefit to removing decoys
action_space_args: # Info needed to define action space
type: subnet # Type of target for this action : 'standalone' (no target) | 'subnet' | 'host'
shared_data: # No data to share
This references the RemoveDecoyHost module. This configurability also allows you to be able to create your own blue agent actions as modules in the blue_actions/actions directory and define them in the YAML.
Let me know if you have any more questions about this!
Thank you. That worked out well.
I have a follow up question, I'm new to WANDB and would like to evaluate the performance of the agent.
So, my goal is to output the same results as figure 6 in the paper, do I need to add wandb.log to Train_cyberwheel.py?
The writer.add_scalar() functions that are called in train_cyberwheel.py have the same effect as wandb.log(). So they should already be passing the relevant values like episodic rewards, and evaluation results. On the wandb page, you should be able to create graphs with the metrics you want from any given run(s).
Thanks for the answer. When interpreting Figure 6, does Reward on the Y-axis mean episodic reward?
In Wandb, I don't see the X-axis as anything other than Step, so I thought that would require additional logging.
Am I correct in understanding that the information that is logged by default in Train_cyberwheel.py can be plotted as shown in the figure?
Yes, episodic_reward as the Y-axis and step as the x-axis should produce the same type of graph as shown in the figure (Reward / Episode). The actual content and results of the graph will obviously be different depending on various factors like the configured decoys, red strategy, blue agent rewards, and other training parameters that you may set.
hello. I have a few questions about the cyber wheel demo.
The behavior of the blue agent is defined as deploying decoys, removing decoys, or doing nothing as described in the documentation.
However, according to the dynamic_blue_agent.yaml in the resource, it seems to only use “nothing” and “deploydecoy”.
In cyberwheel_dynamic.py, the comment for blue_config says that blue_agent_config.yaml is the default, but I can't find that file in this repository, so my guess is that this file is using all the actions.
After all, the blue agent's actions only reference dynamic_blue_agent.yaml, so if I'm misunderstanding the code, I'd appreciate some advice.