Hive-Systems / pyfair

Factor Analysis of Information Risk (FAIR) model written in Python. Managed and maintained by Hive Systems
https://www.hivesystems.com
MIT License
87 stars 44 forks source link

Meta Model Average vs. Sum Operators #36

Open jonrau1 opened 3 years ago

jonrau1 commented 3 years ago

For the Meta Model, it appears that ALE calculations are a Sum of all downstream models, instead of an Average, I am unsure if this is FAIR-esque but it feels like it should be an Average across all potential loss scenarios / threat communities evaluated. I only wonder this because it is quite easy to push up average ALEs into the several Billion dollar cap and that Sum feels like an assumption is made every scenario will happen at once.

The preferred behavior would be a flag to choose the Aggregation by Sum or Average (or other things I guess) - some scenarios may make sense to do serially (e.g. a data exfil event along with a ransomware event)

For instance, here is the output of a Meta Model from a handful of TCOMs with dummy data

  mean stdev min max
State Actors Model $1,030,199,841 $888,260,670 $4,377,753 $5,551,151,462
State-sponsored Actors Model $11,728,227 $9,979,715 $63,759 $58,396,462
Organized Crime Model $0 $0 $0 $0
Hacktivists Model $0 $0 $0 $0
Cyber Espionage Model $0 $0 $0 $0
Accidental Insiders Model $0 $0 $0 $0
Privileged Insider Threats Model $435,876,396 $374,137,344 $1,906,190 $2,457,416,921
Unprivileged Insider Threats Model $2,111,754 $1,812,884 $10,301 $10,794,341
Opportunistic / Unskilled Attackers Model $0 $0 $0 $0
Risk $1,479,916,218 $966,145,076 $17,907,282 $6,288,729,598

To combat this I can take averages of the POA and TCs across all TCOMs, but that doesn't feel like the right oomph - I like to show where we have strong resistance against a specific threat community as this also informs our Red Team operations.

I can also provided this mocked up data, well some of it, my print statements were errant as a to_json() model.

theonaunheim commented 3 years ago

TL;DR: maybe? I'll take a look this weekend. This class was intended to present information on the aggregate risk.

Disclaimer: aggregation is undefined in the Open FAIR standard; the FairMetaModel is essentially "what Theo believes the Open Group would have done if they covered this topic".

The FairMetaModel's intent was to get the total anticipated loss exposure for a given time frame. If I have a "State Actors" model for 2022 and a "Hacktivists" model for a particular application in 2022, our total loss exposure for that application in 2022 will be "State Actors" PLUS "Hacktivists" ... as they are simply components of a larger risk (presuming independence). This tells generic executive what the effect is on the company's bottom line.

The average loss exposure for a given threat community is an interesting metric. It doesn't seem to have the same utility, however and seems to be more appropriate for a separate class or a FairSimpleReport([hacktivist, state_actor]).

theonaunheim commented 3 years ago

@jonrau1 , if you could attach the mocked up data that would also be helpful.

jonrau1 commented 3 years ago

@theonaunheim I think perhaps another Class of Models then, if possible, I appreciate I'm opening up Issues like a madman and appreciate you looking into it. I did not think of doing a FairSimpleReport from all the models first, so that will work for now, but we will still need to average out TC and POA across a whole slew of Threat Communities which feels a bit deluded - but I suppose it's the only way to get it done right now without shooting the aggregate risk way up.

If it helps (not sure if it came across clearly in the original Issue text) the way we do it is define yearly threat radars and use CTI, Red Team Ops, and Wild Ass Guesses (WAGs) to fill in POA and TC and the intended idea is that per application we have to do simulations against all of the Threat Communities as more of a governance tool to argue "Hey, don't like that sticker shock from ALE? Then do X, Y and Z like you have to to resist X, Y or Z threat community". I was hoping the Meta Model would average all of those out as we don't expect multiple Threat Communities per Scenario to attack us all at once - though that could be something good to use - hence both an Average and the current Sum to model that "Black Swan" event.

Then from there we group all of the simulated Applications and roll them up under their Divisions, Business Leader, Market Segment, etc. - so for that meta model it actually makes sense to have the aggregate across all applications as it's more than likely the events and threat communities we model / hunt for will attack multiple applications or move laterally throughout a segment. However having the option to present both Average and Aggregate based on the Simulation and not simply rolling up TC/POA/V, etc. would be cool. Even if that means I calculate a FairMetaModel and FairAverageModel and run them both into a FairSimpleReport to show the difference, or give two sep. reports.

I attached a few older TCOMs and some placeholder data (the LM is actually pretty realistic based on ORX, SEC, GDPR and Cyber Insurance payout info) as a TXT file, hope it helps. tcom-mock.json.txt

priamai commented 3 years ago

Hi gents, yes this is a common use case as far as I know and we do a very similar analysis. I have wrote some code to accommodate @jonrau1 example on this Colab notebook. I am not sure about the last step required, do you want to create a combined LOE for all the entries on the table or just a comparison of the single LOE?

A few more questions for curiosity: 1) do you calibrate your WAGs estimate? 2) how do you use ORX to estimate the loss magnitude? It is just a taxonomy, do you pay their premium services that gives you datapoints for estimation? 3) SEC = security exchange commission? 4) TC = threat contact frequency?

I hope to be of some help. Cheers!

priamai commented 3 years ago

I also like the idea of a MetaModel where the user can decide and/or provide the joint or conditioned probabilities with multiple threat actors. This will enable us to model events such as the loss related to P(ransomware|phishing) or to P(ddos,data_breach). I am not sure whether this was related to the other discussion with pomegrenate and in any case will require the user input to provide the probability distributions.

jonrau1 commented 3 years ago

@priamai thanks for your Jupyter-fu, much better than what I would have done. For the first question we'd prefer to show an Average within the Meta Models, and ideally within the Simple Report, the key goal is not just getting all of the TCOMs added together unless for some reason we thought that the chance was there, in which case we'd scope it down. Really it's too show their maturity of controls and adherence to our governance programs versus "what could happen" - the aggregation usually comes out to several billion which causes them to question WTF we're building.

For your questions

  1. The WAG calibration is done with threat hunting / CTI and red team ops (I own 2/3 of those functions too) - but when we do not / cannot have enough metrics, TTPs, artifacts, etc. to hunt for we have to "fill in the blanks" and sometimes run variable simulations with small +/- 3-5% adjustments all around
  2. We have access to their data feeds, they have "freemium" reports as well but are averages and not all of it is focused on cybersecurity but it's a good sanity check
  3. Correct
  4. TC = Threat Capability
theonaunheim commented 3 years ago

I think the goal is:

  1. Feed a grouping of models into a new class that has the same basic summary/export methods of the FairMetaModel, but that doesn't add the components into an Overall "Risk" column during "calculate_all()"
  2. Have the FairSimpleReport recognize this new class, and 1) break out its components to graph the distribution, tree, and loss curves like you would for a list of regular models;, and 2) then plot the components like subparts of a metamodel with violins?

Am I understanding this correctly?

jonrau1 commented 3 years ago

RE: 1 - Instead of adding all Risks together it should ideally take the average / mean of all of them and present it that way. So if Risk A is $1000 and Risk B is $100 at the MAX I should get an Average MAX ALE of $550 and not $1100 (Sum) as it currently is.

RE. 2 - That's worded a lot better than I tried to articulate it, yes!

priamai commented 3 years ago

RE: 1 - Instead of adding all Risks together it should ideally take the average / mean of all of them and present it that way. So if Risk A is $1000 and Risk B is $100 at the MAX I should get a MAX ALE of $550 and not $1100 as it currently is.

RE. 2 - That's worded a lot better than I tried to articulate it, yes!

I am confused about your example: ALE1=$1000,ALE2=$100, MAX(ALE1,MALE2)=$1000 not $550, maybe you meant AVG(ALE2,ALE2) = 550? @jonrau1

My preference for the data frame will be to have indeed multiple Risk columns (one for each simulation run) associated with the different loss events. I would prefer to have multi level columns where maybe the top level is the threat vector and the bottom level are the FAIR columns as they are now.

I can then decide to apply a specific operator across the rows or columns.

I can write a mockup if is not clear @theonaunheim . Cheers!

jonrau1 commented 3 years ago

RE: 1 - Instead of adding all Risks together it should ideally take the average / mean of all of them and present it that way. So if Risk A is $1000 and Risk B is $100 at the MAX I should get a MAX ALE of $550 and not $1100 as it currently is.

RE. 2 - That's worded a lot better than I tried to articulate it, yes!

I am confused about your example: ALE1=$1000,ALE2=$100, MAX(ALE1,MALE2)=$1000 not $550, maybe you meant AVG(ALE2,ALE2) = 550? @jonrau1

Yeah poorly worded, meant averages of the max. Edited 😬

theonaunheim commented 3 years ago

Awesome. Thanks for elaboraring. Will figure out feasibility this weekend.