Bug in D.A.DallDS.0 - Githubissues

mike1813 commented 6 months ago

There is an issue with the surfacing threat D.A.DallDS.0, which causes Loss of Availability at a Data asset (representing a type of data) if all copies of the data are unavailable.

This threat is suppressed if there exists a cached copy of the data in the system that is not unavailable. Technically, it is true that the data is not completely lost if that is the case. However, a cached copy is not easy to access and may be deleted at any time. In practice the fact that there may still be a copy in a process cache does not negate the threat - any more than the possibility of a copy still being in transit between two processes (which is really what a cached copy is).

mike1813 commented 5 months ago

Best solution is to invert the inheritance hierarchy between DataCopy and DataCache. At present, DataCopy is the parent class of DataCache, meaning a DataCache is a special kind of DataCopy. But in D.A.DallDS.0, the special quality we need is persistence, so really the DataCopy class represents the special case.

Where we need to select cached data only, this can be done using direct relationships between the cached data and the Data Flow that is being cached. A persistent DataCopy has no direct relationship to a DataFlow, because where a DataCopy is created from an inbound DataFlow, or a DataFlow is created by reading and sending a stored DataCopy, the relationship is via a Process.

To perform this inversion, it is necessary to review all threats and construction patterns referring to either class, and ensure that the right one is used, so that:

where the pattern should select only persistent data, the DataCopy class is used
where the pattern should select only cached data, the DataCache class is used, along with a direct relationship to/from a DataFlow
where the pattern should select any stored data, the DataCache class is used with no direct relationship to/from a DataFlow

mike1813 commented 5 months ago

Test cases created for this issue:

Issue 120 Case 1a asserted v1.nq.gz: a service gets data from a client and forwards it to another client, where all processes have fixed hosts in one location

Issue 120 Case 1b asserted v1.nq.gz: as Case 1a, but the service also saves the data on its host
Issue 120 Case 2a asserted v1.nq.gz: a service on a mobile host gets data from a client and forwards it to another service accessible in only one of the mobile host locations

Issue 120 Case 2b asserted v1.nq.gz: as Case 2a, but the second service also saves the data on its host

In cases 1a, the service must cache the data from the first client until the second client requests it. However, in case 1b, no cache is needed because the service saves the data anyway. In cases 2a and 2b, the first service must cache the data from its client when its host is in a location where the second service is inaccessible. It doesn't make any difference whether the second service stores the data on its host.

With domain model v6a5-1-1, we get a D.A.DAllDS.0 threat to the data in all four cases. This happens because DataCache (a saved data flow inferred to exist) is wrongly considered a subclass of DataCopy (a persistent copy of the data), so even in cases 1a and 2a where there is no persistent copy (i.e., the system is not meant to retain this data), we still get the threat.

What should happen is that we get D.A.DAllDS.0 threats only in cases 1b and 2b, and in case 2b the threat should not involve the cached copy of the data.

mike1813 commented 5 months ago

Fixes have now been made as discussed in branch 40, and checked using the above test cases.

One thing should be noted. The current implementation assumes that the encryption status of a cached copy of the data is distinct from the encryption status of the cached data flow. The reason this assumption is made is that it is consistent with the principle that, when in doubt, we should aim to overestimate rather than underestimate risks.

The idea is that if a process creates an outbound data flow but can't send it, the process may cache the data and encrypt the data flow only when it can be sent. If a process uses an inbound data flow, but receives the data in a context where it can't be used right away, it may decrypt the data before caching it. Because the cached copy of the data may be unencrypted, we assume it is unless asserted otherwise (by the system-modeller user/client asserting that an encryption control is present).

However, if the data flow is encrypted with keys from a key vault service, there will be a 'Vault-controls-DataFlow' relationship and the encrypted status will be inferred from this, rather than being asserted by selection of a control. In that situation, the inference is propagated (by construction pattern DCDFcV+c) from the DataFlow to the cached copy of the Data.

In principle, this conflicts with the principle that we should never underestimate risks, because when a process decrypts or encrypts data does not depend on whether the key is managed by a key vault. In principle, therefore, pattern DCDFcV+c should be dropped, unless the data is simply forwarded by the process (in which case there is no need to encrypt/decrypt). This would mean that users or clients must select an encryption control on a cached copy of the data to indicate that the data is encrypted before being cached or decrypted after being cached.

However, that may degrade user-friendliness, as users would find they need to assert a control on an asset that arises from double inference, i.e., the cache is inferred from the data flows and location contexts which are themselves inferred. That may confuse some users...

...so for now, pattern DCDFcV+c has been retained. Whether it should be kept will become a separate issue.

mike1813 commented 5 months ago

The inconsistent assumptions described above have now been moved to a separate issue #125. On that basis, this issue can now be closed.

Spyderisk / domain-network

Bug in D.A.DallDS.0 #120