Spyderisk / system-modeller

Spyderisk web service and web client
Other
3 stars 4 forks source link

Discussion: asset classes, instances and populations #134

Open mike1813 opened 6 months ago

mike1813 commented 6 months ago

Population modelling

Mathematically speaking, the assets in a system model - including asserted assets visible on the system-modeller U/I canvas - are asset classes. For example, we may have the following hierarchy:

Another such hierarchy might relate to devices:

The specialisation captured by a domain model subclass is based on intrinsic asset characteristics. So, Adults are Humans old enough to be legally responsible for certain decisions, and Servers are Hosts with no graphical user interface. The specialisation to a system role depends not on intrinsic characteristics, but on relationships within the system. In this example, the role of OnlineStoreServer would be defined by relationships indicating where it is, what networks it is connected to, and what processes are running on it. The role of SysAdmin would be defined by relationships to Hosts and/or Processes. If the system model contained a relationship SysAdmin-manages-OnlineStoreServer, it means every member of the SysAdmin class manages a member of the OnlineStoreServer class. Any human that doesn't manage an OnlineStoreServer instance is not a SysAdmin, and any host not managed by a SysAdmin isn't an OnlineStoreServer instance.

In most cases (and the default assumption), assets are asserted as singletons. This means that an icon on the U/I canvas, and the associated system model URI (or just the URI for an inferred asset) correspond to one asset in the modelled system. In general, however, the population level can be increased, indicating that the icon (and URI) represents more than one asset in the modelled system. If this is done, system model threats involving the asset class may also become non-singletons. Likelihood associated with non-singleton asset behaviour are represented by an average level (likelihood of the behaviour in a randomly selected member of the population) plus min/max levels (likelhood of the behaviour in all/any members). The same avge/min/max representations are used for TW levels and control coverage levels over an asset population, and for likelihood levels for non-singleton system model threats.

Some features of system modeller (and associated domain models) fail to adequately support this concept, or fail to fully exploit it. We need to decide which of these aspects should be improved, and launch new issues tracking how they are addressed.

Reflection issues

At the most basic level, there are shortcomings in the way a system model can be related to the modelled system and vice versa.

The system model uses a classification of assets in terms of a base type and a system role expressed via relationships, e.g., assets in the system#OnlineStoreServer class are of type domain#Server, run processes in the role system#OnlineStore, are connected to networks in the role system#RetailLAN, and are managed by humans in the role system#SysAdmin.

In the real system, assets in the system#OnlineStoreServer class will not be called 'OnlineStoreServer'. Each will have a hostname, an IP address on at least one subnet fulfilling the role of system#RetailLAN, and be managed by at least one human in the role system#SysAdmin. The human(s) will not be called 'SysAdmin' but will have names like 'Fred Bloggs', 'Julie Smith', etc. If there is only one subnet in the role of system#RetailLAN, it might actually be called 'RetailLAN' but even for singletons the system asset name may be different from the system model class name. System model asset class URIs are not related to either the System model asset name or the system asset name, of course.

System modeller does allow 'Additional Properties' to be attached to a system model asset class. However, it doesn't support the creation of 'Additional Properties' for multple asset instances. What we really need is a collection of instances, each with its own 'Additional Properties' at least one of which relates to its identity in the modelled system, as well as some class-level 'Additional Properties' that apply to every instance. Note that we wouldn't need to store the 'Additional Properties' for every instance - it may be enough to specify additional properties related to the entire class, plus one or two asset instances where there is some noteworthy property worth recording.

If this was done correctly, it would solve some of the problems we have when trying to represent output from system monitoring in a system model, e.g., reducing the TW levels for a host found to contain known software vulnerabilities, or of a device found to be the source of malicious messages. The relevant asset could be looked up, found to be a member of the corresponding system model class, and the class attributes updated.

So we might say that the system#OnlineStoreServer class has IP addresses in the range 192.168.1.0/24, and a vulnerability scan found a particular CVE is present at 192.168.1.3. From this we might infer that the Min TW levels and (if the population is low enough) some Avg TW levels should be reduced.

This wouldn't solve all the problems of system reflection, of course. If we discover a new device connected to the network, it might not be possible to determine whether that device is a member of the system#OnlineStoreServer, as other system model host subclasses may be connected to the same subnet using IP addresses in the same range. But at least where we do know which class of system asset we're dealing with, it can be related to properties associated with the non-singleton system model asset class, and vice versa.

Control coverage issues

At present, the coverage for a specific control applying to a system model asset class is represented as a TW level. If the asset class is a singleton, the control can only be 100% present or 100% absent, but the coverage level allows us to express whether the control is at all times effective. Controls like 'software patching' are not 100% effective, because patches become available after a vulnerability is discovered, so there will be short intervals during which the control would not prevent exploitation. However, most controls have coverage level 'Safe' because they are effective if present. A separate boolean property is used to indicate whether the control is present.

If the asset class is non-singleton, the coverage level still expresses how (un)likely it is that the control will be ineffective when needed. The average coverage level represents the inverse likelihood that the control will be ineffective for a randomly chosen member of the population at an arbitrary time. Min and max coverage levels refer to the likelihood that the control is ineffective in any or all members of the population, respectively. While it is still possible for a control to be ineffective when present, in a population, the most practical interpretation is that coverage level represents the proportion of its members that have the control, and temporary control ineffectiveness a perturbation from this.

Default coverage levels are defined in the domain model per control and asset type, so system modeller users aren't forced to set perhaps hundreds or thousands of coverage levels. However, the separate boolean property is still used to signify whether the control is present (at some coverage level) in a population. The reason is that default coverage levels tend to be high (most are 'Safe'), so most threats would be addressed by default if they were used 'as is'. System modeller users would think everything is fine, but to match the model they would need to implement every control. That is rarely necessary or practical, so implementers would probably need to leave some out, but with no idea which are really needed. The boolean flags are switched off by default, so system modeller users must switch them on, typically just for a few controls until risk levels are acceptable, whereupon the boolean flags indicate which controls are actually needed.

At present, this boolean flag is supposed to apply to the whole asset class. However, that means if a control produces side effects, they will be produced at every asset in the class. For some controls, this makes no sense. E.g., if a service can detect that a client may be compromised using biometrics or some such data, then a possible control strategy might be to disable access by clients where this is detected. But disablement causes side effects, and we don't want those caused for the whole population of clients - only the ones believe to be compromised. For this type of control we want selective enablement. In principle this could be represented by setting the boolean flag to true for the min coverage level (i.e., that the disablement control is present for at least one client), but not for the average (a randomly chosen client) or max coverage (all clients).

This is likely to be most relevant for controls that are run-time malleable (i.e., they can be enabled or disabled in an already running system). See also issue #74.

Normal operational effects

The same observation holds for normal operational effects representing behaviours that are expected and usually desired, but nevertheless increase risks. The archetypical example is the 'domain#InService' behaviour of domain#Host assets, meaning the host is in service, caused by a threat that represents the (non-malign) action of switching the host on.

If access disablement controls should be enabled only for a few members of a population, then presumably the likelihoods of in service behaviours should also handle cases where only a very small number of assets in a population lack the behaviour.

CSG recommendations

We have an algorithm that uses threat path analysis to find control strategies in a system model that could be used to reduce risks in the modelled system, either at design time (based on a future risk analysis) or in a running system (based on a current risk analysis). In the runtime version it is necessary to convert the recommendation into a form that someone (or some automated process) fulfilling a system managent role could use.

The use of instance-related 'additional properties' would obviously help with this. If a vulnerability scan showed a patchable CVE is present in a host with the role of system#OnlineStoreServer and IP address 192.168.1.3, then the recommendation to apply the patch could refer to the relevant IP address. At present, that only works properly if the system#OnlineStoreServer class is a singleton. See issue #67.

In some cases, the CSG recommender algorithm may be unable to detect that such a control strategy is useful unless it can use selective enablement of controls.

Where the risk is caused by side effects from a control, especially a disablement control, this may also depend on the correct representation of secondary (normal operational) effects from the disablement.

Arguably, the CSG recommendation algorithm development should not seek to address system models containing non-singleton asset populations until we have resolved how system model asset classes and system asset instances should be related, and behaviour likelihoods, TW levels and control coverage levels related to those asset instances.

mike1813 commented 6 months ago

@scp93ch should consider if/when these aspects ought to be addressed.