User interactions with data - inference patterns are too conservative

If a user interacts with a process that creates and/or receives data (i.e., it processes data, not merely serving or relaying data), then the user interacts with the data if either:

the system-modeller user asserts that they interact with the data, or
the nature of the process and/or other relationships of the data imply it.

The second is deduced by construction patterns that look for other possibilities that may rule out the potential inferred user interaction. In practice, this can lead to odd outcomes, e.g., a user who updates data via an editor and also uses a web browser that receives the data will be assumed not to view the data via the browser.

It seems the construction patterns should be a little more generous and avoid ruling out inferred user interactions with data in a few more situations than they do at present.

Not a huge issue because system-modeller users can add asserted relationships (Process-enablesUserInput-Data, etc.), to get what they want.

Except that the necessary relationships were not tagged as 'assertible'. Changed that on branch 85, but left the rest for now.

Three construction patterns could usefully be revised. They all create Human-Data and Process-Data relationships to indicate that the Human (who must be an interactive user of the Process) views the data as output via the Process:

HuirIPp-iD+vD: adds the relationships if the process is an Interactive Process, processes the data, and the user doesn't also input the data.
HuirPc-iD-rC-vHu-DS+vD: adds the relationships if the process creates the data and it isn't stored, received by any process, or viewed by any human, and the user doesn't also input the data.
HuirPr-iD-cD+vD: adds the relationships if the process receives the data and creates no output, and the doesn't also input the data.

In all three cases, the pattern is suppressed if the user also inputs the data. This ensures that the relationships for interactive user output are not added when in fact the user is updating the data via the process. However, there is nothing to say that the update happens via the same process, so that no user output is deduced if the user enters the data via some other process.

There are other situations where user output would probably be supported by a process, but the data is used elsewhere in such a way that it cannot be assumed that the process does display the output. For example, it may be that the process creates output, and might show it to the user, but it is also sent to another process where it is used as input. In that situation we can't be sure if the data is shown to the user, because it may be computed by the process only for consumption by the other process, so we can't add the user output relationships - it is up to the system modeller user/client to decide if that is appropriate.

Now addressed on branch 40, illustrated by the attached test cases.

These are three of those used for issue #40. The original versions had an asserted WebBrowser-enablesUserOutput-TextData, since the construction patterns could not deduce that the WebBrowser was presenting the data to the user.

The attached versions do not have this asserted relationship, but with the fixed construction patterns this is OK. The fact that the data is interactive user output at the WebBrowser is deduced from the WebBrowser-receives-Data relationship, given the nature of the WebBrowser...

RC-Check Collocated Process Comms 1b - Asserted.nq.gz. RC-Check Collocated Process Comms 2b - Asserted.nq.gz RC-Check Collocated Process Comms 3b - Asserted.nq.gz

It turns out that the fixes made via branch 85 are not sufficient even as a temporary fix for this problem. We still get inappropriate user i/o inferences, where a user interacts with more than one process that processes the same data.

For example, in Issue 107 Test 1a - Asserted.nq.gz, the intention is that:

the 'Customer' has a smart phone 'Customer Phone' which they use at 'Home' and also when out in the 'Town'
process 'Diet Planner' running on 'Customer Phone' reads 'Meals' from a service 'Data Service' running on the user's 'PC'
process 'Diet Planner' also gets another input sent by sensor 'Step Counter', which may send new values when in the 'Town'
process 'Diet Planner' produces 'Diet Plan' as an output and sends it to the Data Service running on the user's 'PC'
user 'Customer' enters data 'Meals' via the 'Meal Diary' process running on their 'PC' in their 'Home'

This was created as a domain model development test to check inference rules that generate cached copies of data. 'Diet Planner' cannot access input 'Meals' when running in 'Town', so it cannot produce output 'Diet Plan' when in 'Town'. This means it should not need to cache either the input 'Meals' nor the output 'Diet Plan'. However, it gets sensor readings from 'Step Counter' (the inferred sensor data asset '[SensorData:StepCounter]') when in 'Town', which it cannot immediately use. So 'Diet Planner' should cache this sensor data.

The problem is that the relationship 'Customer-inputsData(enters)-Data' triggers inference rules that add 'enablesUserInput' relationships from both processes used by the 'Customer'. Thus it is assumed that 'Diet Planner' gets 'Meals' from its interactive user, which it can use to produce output 'Diet Plan' when in the 'Town'. It cannot send this output to the 'Data Service' from there, triggering a caching inference rule that creates a cached copy of 'Diet Plan' on the 'Customer Phone'. Moreover, because the 'Diet Planner' process has access to 'Meals' in the 'Town', the caching rule that should create a cached copy of '[SensorData:StepCounter]' is not triggered.

In the modified version Issue 107 Test 1b - Asserted.nq.gz, an asserted relationship 'MealDiary-enablesUserInput-Meals' has been added. This suppresses inference pattern HuiDirIPpD-P+eUI that previously added 'DietPlanner-enablesUserInput-Meals', but that means HuirIPp-iD+vD now creates a 'DietPlanner-enablesUserOutput-Meals' link. There is no way to specify that DietPlanner just uses this data as input, except by specifying that it gets the input from its user, which creates the problem found in Issue 107 Test 1a - Asserted.nq.gz.

Moreover, as a consequence we get an inferred 'Customer-viewsData-Meals', so now it is assumed that the 'Customer' amends data 'Meals'. This triggers a modelling error threat because in this case, they aren't interacting with any process that is amending this data. This is a bug in the modelling error - it should check for the possibility that different processes handle input and output, but the problem arises because the Human-Data relationships cannot specify which process is involved.

In conclusion, making 'Process-enablesUserInput-Data', 'Process-enablesUserOutput-Data' and 'Process-enablesUserUpdate-Data' assertible certainly helps in some situations, but it doesn't prevent those Human-Data relationships causing problems. It looks like the only way this could be solved is to remove these Human-Data relationships or (if still needed in threats) make them inferred, and alter the inference sequence including HuiDirIPpD-P+eUI and HuirIPp-iD+vD so they no longer drive the outcome.

Discussion with @samuelsenior and @scp93ch concluded as follows. We should:

[x] refactor construction patterns so they are driven by 'Process-enablesUserInput-Data', 'Process-enablesUserOutput-Data' and 'Process-enablesUserUpdate-Data' relationships
[x] make 'Human-viewsData-Data' and 'Human-inputsData-Data' inferred only relationshps so they can be used where necessary in threat patterns
[x] infer Process-Data interactivity relationships from Process-Data processing relationships where there is no ambiguity or alternative explanation, but without using Human-Data interactivity relationships
[x] infer Process-Data processing relationships from Process-Data interactivity relationships where possible, so that system-modeller users can assert either but should not normally need to assert both
[x] include patterns at the start of the construction sequence to detect asserted Human-Data relationships, and add new inferred relationships alongside to tag them as a use of a deprecated feature
[x] introduce modelling error threats that detect the tags for asserted Human-Data interactivity relationships, and requires their replacement by asserted Process-Data relationships.

This approach has the advantage that the asserted Process-Data relationships expressing user interaction with data include the process used, greatly reducing the scope for ambiguity in subsequent inference patterns. Removing the dependence on Human-Data relationships prevents destructive interference with these (now inferred) relationships.

Introducing construction patterns and modelling error threats to detect asserted Human-Data interactivity relationships means we can tolerate the loss of backward compatibility, as any old system models that no longer work properly would trigger errors and so prompt system-modeller users to update their models appropriately.

Changes now implemented as follows:

inserted patterns HuuD+aUD, Hui-uD+aID and Huv-uD+aVD at the start of the user-data interactions sequence to find asserted Human-Data interaction relationships and tag them as such by adding a parallel labelling relationship,
added construction patterns HuirPeUD+aD, HuirPeUI+iD and HuirPeOD+vD to infer the Human-Data interaction relationships from Process-Data interactivity relationships: this means they can still be used in later patterns and threats, if appropriate,
added construction patterns HuirPeUD-a+a, HuirPeUI-OD-p+r and HuirPeO-ID-p+r to infer data processing relationships where they are missing and could be inferred from Process-Data interactivity relationships,
removed construction patterns HuaDirPaD-P+eUU, HuiDirIPpD-P+eUI, HuiDirPpD-P+eUI, HuvDirPrD-P+eUO and HuvDirPcD-P+eUO, in deduce Process-Data relationships from Human-Data interactivity relationships,
modified construction patterns PDO-gDAnDF-i+i, PDO-gDAnDF-i+i-Replay and H-sDFcCaiAC+DS so they don't use processing relationships where these could exclude cases where data flows to or from an interactive user,
added modelling error threats D.E.HuaUD.9, D.E.HuaID.9 and D.E.HuaVD.9 to flag cases of Human-Data interaction relationships that are labelled as having been asserted,
removed modelling error threats D.E.HuaD-P.9 and D.E.Hui-vD-P.9 which flagged cases where asserted Human-Data interaction relationships existed with no mediating interactive process - these are not needed now the Human-Data relationships are inferred.

In addition:

IoT construction patterns which precede all this have been altered where necessary for consistency with the above, specifically to remove any constructed Human-Data interactivity relationships,
spam construction patterns have been similarly altered,
altered threats for cross-site data leakage by injection of forged content so they use only Process-Data relationships to exclude user input, threats CC.AuC.DFrXSS.3 and CC.AuC.DSrXSS.3, and related side effect threats P.A.HuDFrXSS.6 and P.A.HuDSrXSS.6.
altered threats for deception of users by injection of forged content so they use only Process-Data relationships to determine if the data is viewed by the user and used to determine subsequent steps

A few points are worth noting.

First, because the Human-Data interactivity relationships are now inferred from Process-Data interactivity relationships, few of the subsequent construction patterns and threats need to be modified. Most changes in subsequent patterns are removal of patterns where the Human-Data interactivity relationships are used to infer the presence of Process-Data interactivity relationships.

We do still infer the existence of Process-Data interactivity relationships (plus the implied Human-Data relationships) where the process is interactive and processes data that either (a) goes nowhere if not to the user or (b) has no possible source except for the user. We may want to drop these patterns so all Process-Data relationships must be asserted, but for now I kept them because we have system models that rely on these inferences. The important point is that these patterns are not derived from an asserted or inferred Human-Data interactivity relationship.

Cache inference patterns do not infer the need to cache user output, nor user input that should be sent to remove services. The assumption is that if data arrives for display it will be shown immediately, and if user input needs to be sent elsewhere that will be done immediately or not at all.

The changes now implemented allow the main test cases for this issue to work correctly (see here for the current versions of these test cases):

Issue 107 Test 1a: gives a 'deprecated relationship' modelling error, but the correct data lifecycle except that it fails to infer that data 'Meals' is a user input (since the 'Customer-inputsData-Meals' assertion is no longer valid)
Issue 107 Test 1b: gives a 'deprecated relationship' modelling error, and the correct data lifecycle. The asserted enablesUserInput relationship means data 'Meals' is now understood to be a user input.
Issue 107 Test 1c: is the same as 'Issue 107 Test 1b', but with the asserted 'Customer-inputsData-Meals' removed. This gives the correct data lifecycle, and no 'deprecated relationship' error.
Issue 107 Test 1d: is the same as 'Issue 107 Test 1c', but with an extra output 'StepsSummary', displayed by the 'DietPlanner' app to the 'Customer'. However, this is not stored or created by any other process, and so leads to a modelling error.

The last case shows that we still need one more construction pattern to infer that a user output is computed by the interactive process used to view it, where the data does not have any other cause.

Regression tests look OK:

RC-Check Collocated Process Comms 1: has a user logging into a server to edit data, and also accessing the data via a browser, where both clients access their services via the same reverse proxy. The data lifecycle cannot be resolved when different clients use the same data via the same proxy, which leads to an inferred (and spurious) indirect interaction between the telnet login client and the back-end website. In this particular case we do get a modelling error to say this type of client cannot handle graphics.
RC-Check Collocated Process Comms 2: is the same model but the telnet client does not use the reverse proxy, so a separate access path is found via an inferred login service. The data lifecycle is correct, except the asserted login service is not used because the construction patterns can't detect any way it could be used by the telnet client.
RC-Check Collocated Process Comms 3: includes a direct relationship from the telnet client to the asserted login service, so the construction patterns don't need to infer how the user gets remote access to the text editor. This gives the correct data lifecycle.

RC-Check Collocated Process Comms 1 and 2 could not be resolved before - so the situation is no worse than before. Actually, it has improved slightly due to the fixes incorporated for issue #134, so that RC-Check Collocated Process Comms 1 does not give a modelling error.

DataFlow-Test-09s: involves a service providing data to a user's client and also (via intermediaries) to a service that is also use by the same client. Because the client is interactive, it is assumed not to forward data, so the service gets data via the intermediaries.
DataFlow-Test-09s-PlusS: same as DataFlow-Test-09s, but with extra 'serves' relationships to guide the inference logic and force the data flow between services to use the correct path.
DataFlow-Test-09s-Modified: same as DataFlow-Test-09s, but now the client creates rather than consumes data. The inference rules produce an unlikely data flow from the client to both services. This represents poor system design (the user must enter data multiple times), but we cannot infer that this would not happen.
DataFlow-Test-09s-PlusS-Modified: same as DataFlow-Test-09s-Modified but with extra 'serves' relationships. These do not lead to a more plausible data flow.
DataFlow-Test-09s-PlusS-Modified+Fan: same as DataFlow-Test-09s-PlusS-Modified, but with a separate 'fan out' process that marshals the sending of each data asset. If the data does not flow to both services, the fan out process can be connected to only one of them, leading to the desired outcome (client sends data to one service, the other services gets it from there).

The two 'modified' cases gave implausible (but not impossible) results before, and changes to the model of user-data interactions have not made any difference to this.

These residual ambiguous or inappropriate outcomes are now covered by a separate issue #131 and will not be addressed here.

An extra inference pattern was added so test case 'Issue 107 Test 1d' now gives good results.

The only remaining consideration is whether this new pattern, and others retained for backward compatibility reasons, may lead to some incorrect deductions if system modeller users forget to include certain assertions in their models. This is now covered by issue #140, so it can be dropped from the list of concerns here.

DataFlow-Test-09s-PlusS-Modified+Fan wasn't in the zip file so I've attached it here: DataFlow-Test-09s-PlusS-Modified+Fan - Asserted.nq.gz

This has now been addressed, apart from some aspects associated with remote access (see #106) and using modelling errors to signal an inferred user-process-data interaction (see #140).

Spyderisk / domain-network

User interactions with data - inference patterns are too conservative #107