Add any missing entities and attributes

scp93ch commented 4 months ago

I put in place methods to read quite a lot of the entities and their attributes but it is not complete. Domain model matching patterns are not there for instance and lots more besides.

We need all the data to be accessible in the object model.

I'd also like methods so that it's really easy to follow entity relations in both ways. So for instance, a CSG has a predicate pointing to any threat it "blocks" and the Python CSG class should have a method to find the threats from the CSG. We should also have a method on the Threat to return the CSGs that block it.

panositi commented 3 months ago

TODO: check that the domain model matches the system model

@mike1813, @scp93ch the only comparison can be made is against the OWL.versionInfo vs core#domainVersion, ideally there should be an rdf-schema.label vs a domainLabel in system model?

ps current code commits can be found in branch 2-make-this-into-a-proper-python-module

scp93ch commented 3 months ago

Given branch #3 is looking pretty complete now, so I suggest it is merged into main and we switch to fixing a few things. I'll make some comments on this issue about things to consider/fix.

scp93ch commented 3 months ago

DomainModel.TrustworthinessAttribute

The min_of and max_of are part of the system to deal with populations. TWAs come in 3s: the min, average and max. If the TWA is the min one then it will have the min_of predicate. If it is the max one then it will have the max_of predicate. min_of and max_of point to the average TWA of the set of 3. The average TWA will not have min_of or max_of but will instead have has_min and has_max which point back to the min and max TWAs.

I would suggest changing the property min to has_min and max to has_max.

It would be good to have a method that could be called on any of the 3 TWAs in a population set which returned all the TWAs as a tuple (min, average, max). You'd have to have some logic to first navigate to the average TWA based on which predicates were present and then get the other two.

scp93ch commented 3 months ago

DomainModel.Threat

This should have the same min/max/min_of/max_of as the TWA I believe, but linking 3 Threats. Would also be good to have the method to get the tuple of (min, average, max).

Note, not every domain model supports populations. There is a feature flag somewhere that you can check to see if it does or not.

I'd suggest renaming the properties that are Booleans to start with is_ (just as the predicates are) as it make code read better when you e.g. write if my_threat.is_future_risk.

scp93ch commented 3 months ago

DomanModel.CASetting

Need to explain that "CASetting" is the data which is used to create the ControlSets in the system model. (I assume the "CA" in "CASetting" is "Control" and "Asset".)

We might want to rename has_level to effectiveness as that is what it is describing (the effectiveness of the control). It uses the trustworthiness scale, but we are measuring how much we can trust the control which is perhaps better understood as how effective it is. Perhaps just explain it in the pydoc for now.

scp93ch commented 3 months ago

General point regarding return types

In the method definitions you've specified the return type which is great. I wonder about why so many of them are "Optional"? Many should be present, and indeed the code throws exceptions if the data is not there. Should we change to "Mandatory" where appropriate (or whatever keyword is used)?

scp93ch commented 3 months ago

DomainModel.TWAADefaultSetting

Please add to the class's pydoc that this is how we define the defaults for a system model TrustworthinessAttributeSet (similar to CASetting).

scp93ch commented 3 months ago

DomainModel.MADefaultSetting

Again, document that this is the defaults for when we create system model MisbehaviourSets.

scp93ch commented 3 months ago

DomainModel.RootPattern

The links method return type should be a List[RoleLink], and the pydoc on that method should describe that it is a list.

scp93ch commented 3 months ago

DomainModel.ControlSet

I don't think this exists in the domain model?!

The pydoc and exception for coverage_level should say it returns the coverage level, not the trustworthiness level (it does use the trustworthiness scale, but we are measuring controlset coverage).

Same comment for DomainModel.ControlStrategy.blocking_effect.

scp93ch commented 3 months ago

DomainModel.Relation

As you suggested, it doesn't exist as far as I know - it's a system model entity (mapped on to by the Link in the MatchingPattern. Did you find this in some domain model NQ file?!

scp93ch commented 3 months ago

DomainModel.Misbehaviour

I think this should have the same has_min/has_max/min_of/max_of as the Threat and the TWA. So it would also be useful to have a method to get the 3 things (min, average, max) as a tuple, as suggested for the TWA.

scp93ch commented 3 months ago

DomainModel.MisbehaviourSet

Does this exist? I thought it would only be in the system model.

panositi commented 3 months ago

General point regarding return types

In the method definitions you've specified the return type which is great. I wonder about why so many of them are "Optional"? Many should be present, and indeed the code throws exceptions if the data is not there. Should we change to "Mandatory" where appropriate (or whatever keyword is used)?

The Optional represents the None return value, for some methods it is not clear if None is a correct return type. The code tho need to change and in an exception raise the exception instead of returning None which does right now.

panositi commented 3 months ago

The names of various methods can change easily, I mainly used the predicate dictionary keys which were derived from the core or domain predicates.

mike1813 commented 3 months ago

@scp93ch made several comments here concerning population triplets:

for TW Attributes: https://github.com/Spyderisk/spyderisk-python/issues/3#issuecomment-2315097899
for Threats: https://github.com/Spyderisk/spyderisk-python/issues/3#issuecomment-2315120844
for Misbehaviours: https://github.com/Spyderisk/spyderisk-python/issues/3#issuecomment-2315180052, where he was unsure whether triplets are used.

The answer to the question on Misbehaviours is that they are used, referring to whether the related system model MisbehaviourSet has the likelihood of all members of a population being affected (a.k.a. the 'minimum' likelihood because is the likelihood of the least likely member), or any member of the population (a.k.a. the 'maximum' likelihood because it is the likelihood of the most likely). The system model MisbehaviourSet triplet represents the Misbehaviour in the associated non-singleton system asset class (i.e., system asset population). This is basically the same as we have with TWA and the associated system model TW Attribute Sets, except there for reasons of human readability the min and max refer to the least and most trustworthy (i.e., 'min' relates to a max likelihood and vice versa).

A similar arrangement to the TWA is also used for domain model Controls, so we can have system model Control Set triplets that refer to the min, max and average control coverage levels (which are really TW levels). @scp93ch didn't mention this, so perhaps it has been missed out.

The min and max Threat classes work in a different way, with the avge domain model Threat being the parent of the min, max or average likelihood system model Threat. The three system model Threats represent a population of threats (more accurately a set of threats involving a population of assets, since you only get a population of threats if the unique roles in the threat pattern are matched by at least some non-singleton system asset classes). The Validator generates the min/max versions 'on the fly' by starting from the average domain model Threat class and using the hasMin/hasMax properties of the domain model Threat and the Misbehaviours/TWA referred to it its causes and effects.

So the domain model Threat has a hasMin property but it refers to a URI that has no properties in the domain model.

mike1813 commented 3 months ago

One thing to note is that the CASetting, MADefaultSetting and TWAADefaultSetting domain model entities are not expanded to triplets in this way. There is a CASetting for (say) AccessControl at a Host, but not for AccessControl_Min or AccessControl_Max. The risk calculator gets default average values and calculates the min/max if they are not specified, based on whether the coverage levels are independent or correlated over the asset population (as denoted by the 'independent' flag in a CASetting).

mike1813 commented 3 months ago

The domain model does have ControlSet, MisbehaviourSet and TrustworthinessAttributeSet entities, but they are not parents of the corresponding system model versions. A system model MisbehaviourSet is related to a domain model Misbehaviour and a system model Asset class, but a domain model MisbehaviourSet is related to a domain model Misbehaviour and a domain model Role (in a threat pattern).

Domain model ControlSet, MisbehaviourSet and TrustworthinessAttributeSet do not come in triplets because in the domain model they are only referred to as causes/effects of threats, and those are only defined for the average likelihood domain model Threats.

mike1813 commented 3 months ago

In the domain model source code (CVS files), we don't have population triplets. Only the average cases are represented in the CSV files. The CSV2NQ utility inserts the min and max cases if the '-e' command line argument is used - the 'e' stands for 'expansion' as we think of it as 'expanding' the triplets to three members starting from the average.

Since @panositi is creating a python API, it would be nice if we could use it also in CSV2NQ. That means we would need a way to handle the domain model when loaded from CSV files that don't have these triplets, then add the min/max cases, and serialise to RDF (in whatever format).

Obviously, this requirement can be treated as separate, but it may make sense to bear in in mind when deciding how best to add support for triplets in the relevant domain model entities.

It is worth checking the CSV2NQ code to see how it deals with these things. That's where I look if I can't remember which entities are expanded into triplets, etc. Just bear in mind that the RDF output (for which CVS2NQ uses NQ format) has triplets but the CSV input does not.

scp93ch commented 3 months ago

Since @panositi is creating a python API, it would be nice if we could use it also in CSV2NQ. That means we would need a way to handle the domain model when loaded from CSV files that don't have these triplets, then add the min/max cases, and serialise to RDF (in whatever format).

My thinking was that we were moving towards an RDF serialisation of some sort being the "source code" (N3?). RDFLib (as used in this project) can presumably load into its triplestore from a variety of sources, or if not, then we'd convert to NQ from the chosen serialisation format. If necessary, we could extend this project to "expand" a domain model that had been loaded and write it out again.

Spyderisk / spyderisk-python

Add any missing entities and attributes #3

General point regarding return types

General point regarding return types