Inferred encryption status

A DataAsset (i.e., a serialised copy of Data that is stored on a Host or sent between Processes) may be read or altered by a malicious party, and we have threats to model that. Some of those threats can be prevented by encrypting the data. The encryption status of serialised data assets is modelled in two ways:

explicity, by using an Encryption control, which may apply to any DataAsset: the Control must be 'proposed' or 'selected' by the system-modeller user or client to indicate that it 'will be' or 'is' implemented in the system for the associated DataAsset
implicitly, by specifying a 'controls' relationship from a key vault service (i.e., a Process) to the DataAsset: this means that the DataAsset is encrypted using keys managed by the key vault service

Threats to DataAssets that are prevented by encryption:

have matching patterns that exclude the presence of a key vault controlling the DataAsset, and
have control strategies that are enabled by selection of an Encryption control at the DataAsset

There are also some 'side effect' threats to processes that use encrypted data, modelling the possibility that the process may be unable to access the data. These are triggered by the same control strategies, and addressed if the process has access to a key.

These models work well enough if the DataAsset is a DataFlow (serialised data sent between Processes) or a DataCopy (a persistent copy of the data saved on a host). They break down if the DataAsset is a DataCache (any copy of data saved on a host) but not a DataCopy. A DataCache is a saved copy of a DataFlow, created only where the DataFlow sender cannot send the data right away, or the recipient gets data it cannot use right away, due to their physical location and associated network connectivity.

In the current models:

the encryption status of a DataCache should be independent of the associated DataFlow, since
- a sender could save newly created data without encrypting first, and encrypt the data only at the point of sending it
- a recipient could decrypt newly arrived data, and save it unencrypted until it can be used
implicit encryption signified by a 'controls' relationship from a key vault is inferred (by construction pattern DCDFcV+c) to apply to a DataCache if it applies ot the associated DataFlow, which contradicts the above observation
side effect threats to processes that use encrypted data only apply to DataFlow and persistent DataCopy assets, and not to a transient DataCache.

Note that it is possible to infer the encryption status of a data cache created by a process from a data flow if the data is simply forwarded by the process. In that case only, one can suppose that if an inbound data flow is encrypted and the recepient just forwards it to another process, then if the recipient caches the data it will be cached encrypted if the data flow is encrypted.

The model should be made consistent - the encryption status of a DataCache should either be independent of the associated DataFlow in all cases where data is not merely forwarded, or the same as that of the DataFlow in all cases. The current half-way position should not be allowed to persist any longer than we must.

If we assume the encryption status of a DataCache is the same as that of a DataFlow (which we are currently doing if and only if the keys are managed by a key vault), then we may be underestimating risks (e.g., of data disclosure), in cases where the Process saves the cache in unencrypted form. In that respect, it would be better to make the encryption status of the cache independent, by dropping construction pattern DCDFcV+c.

However, this would mean system-modeller users/clients will be forced to select encryption controls on DataCache assets that are saved in encrypted form. But a DataCache (unlike a DataCopy) is not inferred directly from asserted system model input like a Host-stores-Data relationship. It is inferred only when data flows are interrupted in some context(s). It is derived from DataFlow and ProcessContext assets that are themselves inferred. Having to place controls on assets whose presence depends on a two-stage inference may be too confusing for some users.

We need to decide which option is best: a more usable option that is probably right most of the time, but when wrong, leads to risk levels being underestimated, or a less usable option that guarantees risks can only be overestimated.

@scp93ch and @samuelsenior : please comment.

Spyderisk / domain-network

Inferred encryption status #125