Limmen / csle

A research platform to develop automated security policies using quantitative methods, e.g., optimal control, computational game theory, reinforcement learning, optimization, evolutionary methods, and causal inference.
http://limmen.dev/csle/
Other
118 stars 21 forks source link

emulating attacker strategies from Intrusion Prevention through Optimal Stopping game #364

Closed amir-coder closed 3 months ago

amir-coder commented 4 months ago

Hello, I can`t find a way to reproduce your results published in the paper: Intrusion Prevention through Optimal Stopping, how can we emulate or simulate the 3 attacker strategies:

paper

Limmen commented 4 months ago

Reproducing the results involves several steps.

  1. You need to emulate the target infrastructure (it is level-9): https://github.com/Limmen/csle/tree/master/emulation-system/envs/060/level_9
  2. Second, you need to run the corresponding attacker for a long time in the emulation and collect measurement data. The code for running the attacker is available here: https://github.com/Limmen/csle/blob/master/examples/data_collection/static_sequences/level_9/run.py
  3. You need to instantiate the simulation with the data collected from the emulation. The simulation code is available here: https://github.com/Limmen/csle/tree/master/simulation-system/envs/stopping_game
  4. You need to train the defender strategy in the simulation using some optimization algorithm. Examples of many algorithms are available here: https://github.com/Limmen/csle/tree/master/examples/training (e.g, T-SPSA, or CEM).
  5. You need to take the trained defender strategy after it has converged and evaluate it in the emulation, example code is here: https://github.com/Limmen/csle/blob/master/examples/eval/stopping_game_pomdp_defender/eval.py

Note that step 2 took us 6 months. If you want to skip this time and just use the data we collected, it is available for download here: https://github.com/Limmen/csle/releases/tag/v0.4.0

If this is your first time with reinforcement learning I recommend you start with just understanding how the simulation works before you attempt to evaluate the strategy in the emulation.

amir-coder commented 4 months ago

Thank you, In the environment code you showed me there is no way to add the data from the emulation, how can I instantiate a simulation with the json data ?

Limmen commented 4 months ago

You can define the simulation using this configuration object: https://github.com/Limmen/csle/blob/master/simulation-system/libs/gym-csle-stopping-game/src/gym_csle_stopping_game/dao/stopping_game_config.py#L8

It allows you to define the transition tensor (T) and the observation tensor (O) based on the data

amir-coder commented 4 months ago

Sorry for the spam, but in your paper you mentioned:

"B. Estimating the Distributions of Alerts and Login Attempts ... 1) At the end of every time-step, the emulation system collects the metrics ∆x, ∆y, ∆z, which contain the alerts and login attempts that occurred during the time-step. For the evaluation reported in this paper we collected measurements from 21000 time-steps of 30 seconds each. ..."

but in the release you shared, there is no log of emulation, but instead we have a json file containing:

conditional_counts conditionals_kl_divergences conditionals_probs initial_distributions_counts initial_distributions_probs initial_maxs _mins _means _std maxs mins means std

under each we have different conditions, and under each condition we have different metrics,

Limmen commented 4 months ago

Yes the conditions you should use are "intrusion" and "no_intrusion". The metric should be "alerts_weighted_by_priority" (it is a summary of ∆x, ∆y, ∆z, and gives the best results.)

amir-coder commented 4 months ago

I still can`t see how to use the intrusion and no intrusion to set up T and O, can you reference an example or give me an alternative? I need an environment where the defender can play againts naive, experienced and expert attacker

Limmen commented 4 months ago

By default there is already a naive attacker in the environment that you can play against without instantiating with data from the emulation. To do that you can run any of the examples available here: https://github.com/Limmen/csle/tree/master/examples/training

Examples of how to use data to fit statistical distributions to data are available here: https://github.com/Limmen/csle/tree/master/examples/system_identification

amir-coder commented 4 months ago

thank you, In the examples that you shared the code is for running_against_random_attacker, is it the same as running against a NaiveAttacker? In you paper the Naive attacker also had a specific Attack sequence.

Limmen commented 4 months ago

Yes it is the same since the only part that is randomized for the random attacker is the start-time of the attack, which is the same as in the paper.

amir-coder commented 4 months ago

Okay I understand, so running by default means we are running againts NaiveAttacker, and to run againts expertAttacker we use the data you shared to estimate emperical destribution and then using them to create a Simulation environment config (since in the data description it`s mentioned that "Intrusion data collected against expert attacker") is that it?

Limmen commented 4 months ago

Yes exactly.

amir-coder commented 4 months ago

When executing the EmpiricalAlgorithm using the statistics.json data, I got this error:

InvalidTextRepresentation: syntaxe en entrée invalide pour le type json DETAIL: Le jeton « Infinity » n'est pas valide. CONTEXT: données JSON, ligne 70347 : ... "alerts_weighted_by_priority": Infinity paramètre de portail non nommé $1 = '...'

apparently in the no_intrusion condition, the problem is caused by the data, here is my guess of the source of problem:

        "no_intrusion": {
            "adduser_alerts": 0.0,
            "alerts_weighted_by_level": 0.0,
            "alerts_weighted_by_priority": "inf", #inf is causing a problem

   while in intrution:
               "intrusion": {
            "adduser_alerts": 0.0,
            "alerts_weighted_by_level": 0.0,
            "alerts_weighted_by_priority": 0.668, #no problem

so is it right to replace "Inf" with 0 to omit this issue?

Limmen commented 4 months ago

Yes you can replace the inf with any value, or parse it into np.inf

amir-coder commented 3 months ago

It didn't work this way I had to change my method, but thank you for your help.