Spyderisk / domain-network

Network domain model
Apache License 2.0
1 stars 0 forks source link

User interactions with data - inference patterns are too conservative #107

Closed mike1813 closed 1 week ago

mike1813 commented 6 months ago

If a user interacts with a process that creates and/or receives data (i.e., it processes data, not merely serving or relaying data), then the user interacts with the data if either:

The second is deduced by construction patterns that look for other possibilities that may rule out the potential inferred user interaction. In practice, this can lead to odd outcomes, e.g., a user who updates data via an editor and also uses a web browser that receives the data will be assumed not to view the data via the browser.

It seems the construction patterns should be a little more generous and avoid ruling out inferred user interactions with data in a few more situations than they do at present.

Not a huge issue because system-modeller users can add asserted relationships (Process-enablesUserInput-Data, etc.), to get what they want.

mike1813 commented 6 months ago

Except that the necessary relationships were not tagged as 'assertible'. Changed that on branch 85, but left the rest for now.

mike1813 commented 3 months ago

Three construction patterns could usefully be revised. They all create Human-Data and Process-Data relationships to indicate that the Human (who must be an interactive user of the Process) views the data as output via the Process:

In all three cases, the pattern is suppressed if the user also inputs the data. This ensures that the relationships for interactive user output are not added when in fact the user is updating the data via the process. However, there is nothing to say that the update happens via the same process, so that no user output is deduced if the user enters the data via some other process.

There are other situations where user output would probably be supported by a process, but the data is used elsewhere in such a way that it cannot be assumed that the process does display the output. For example, it may be that the process creates output, and might show it to the user, but it is also sent to another process where it is used as input. In that situation we can't be sure if the data is shown to the user, because it may be computed by the process only for consumption by the other process, so we can't add the user output relationships - it is up to the system modeller user/client to decide if that is appropriate.

mike1813 commented 3 months ago

Now addressed on branch 40, illustrated by the attached test cases.

These are three of those used for issue #40. The original versions had an asserted WebBrowser-enablesUserOutput-TextData, since the construction patterns could not deduce that the WebBrowser was presenting the data to the user.

The attached versions do not have this asserted relationship, but with the fixed construction patterns this is OK. The fact that the data is interactive user output at the WebBrowser is deduced from the WebBrowser-receives-Data relationship, given the nature of the WebBrowser...

RC-Check Collocated Process Comms 1b - Asserted.nq.gz. RC-Check Collocated Process Comms 2b - Asserted.nq.gz RC-Check Collocated Process Comms 3b - Asserted.nq.gz

mike1813 commented 3 months ago

It turns out that the fixes made via branch 85 are not sufficient even as a temporary fix for this problem. We still get inappropriate user i/o inferences, where a user interacts with more than one process that processes the same data.

For example, in Issue 107 Test 1a - Asserted.nq.gz, the intention is that:

This was created as a domain model development test to check inference rules that generate cached copies of data. 'Diet Planner' cannot access input 'Meals' when running in 'Town', so it cannot produce output 'Diet Plan' when in 'Town'. This means it should not need to cache either the input 'Meals' nor the output 'Diet Plan'. However, it gets sensor readings from 'Step Counter' (the inferred sensor data asset '[SensorData:StepCounter]') when in 'Town', which it cannot immediately use. So 'Diet Planner' should cache this sensor data.

The problem is that the relationship 'Customer-inputsData(enters)-Data' triggers inference rules that add 'enablesUserInput' relationships from both processes used by the 'Customer'. Thus it is assumed that 'Diet Planner' gets 'Meals' from its interactive user, which it can use to produce output 'Diet Plan' when in the 'Town'. It cannot send this output to the 'Data Service' from there, triggering a caching inference rule that creates a cached copy of 'Diet Plan' on the 'Customer Phone'. Moreover, because the 'Diet Planner' process has access to 'Meals' in the 'Town', the caching rule that should create a cached copy of '[SensorData:StepCounter]' is not triggered.

In the modified version Issue 107 Test 1b - Asserted.nq.gz, an asserted relationship 'MealDiary-enablesUserInput-Meals' has been added. This suppresses inference pattern HuiDirIPpD-P+eUI that previously added 'DietPlanner-enablesUserInput-Meals', but that means HuirIPp-iD+vD now creates a 'DietPlanner-enablesUserOutput-Meals' link. There is no way to specify that DietPlanner just uses this data as input, except by specifying that it gets the input from its user, which creates the problem found in Issue 107 Test 1a - Asserted.nq.gz.

Moreover, as a consequence we get an inferred 'Customer-viewsData-Meals', so now it is assumed that the 'Customer' amends data 'Meals'. This triggers a modelling error threat because in this case, they aren't interacting with any process that is amending this data. This is a bug in the modelling error - it should check for the possibility that different processes handle input and output, but the problem arises because the Human-Data relationships cannot specify which process is involved.

In conclusion, making 'Process-enablesUserInput-Data', 'Process-enablesUserOutput-Data' and 'Process-enablesUserUpdate-Data' assertible certainly helps in some situations, but it doesn't prevent those Human-Data relationships causing problems. It looks like the only way this could be solved is to remove these Human-Data relationships or (if still needed in threats) make them inferred, and alter the inference sequence including HuiDirIPpD-P+eUI and HuirIPp-iD+vD so they no longer drive the outcome.

mike1813 commented 2 months ago

Discussion with @samuelsenior and @scp93ch concluded as follows. We should:

This approach has the advantage that the asserted Process-Data relationships expressing user interaction with data include the process used, greatly reducing the scope for ambiguity in subsequent inference patterns. Removing the dependence on Human-Data relationships prevents destructive interference with these (now inferred) relationships.

Introducing construction patterns and modelling error threats to detect asserted Human-Data interactivity relationships means we can tolerate the loss of backward compatibility, as any old system models that no longer work properly would trigger errors and so prompt system-modeller users to update their models appropriately.

mike1813 commented 3 weeks ago

Changes now implemented as follows:

In addition:

A few points are worth noting.

First, because the Human-Data interactivity relationships are now inferred from Process-Data interactivity relationships, few of the subsequent construction patterns and threats need to be modified. Most changes in subsequent patterns are removal of patterns where the Human-Data interactivity relationships are used to infer the presence of Process-Data interactivity relationships.

We do still infer the existence of Process-Data interactivity relationships (plus the implied Human-Data relationships) where the process is interactive and processes data that either (a) goes nowhere if not to the user or (b) has no possible source except for the user. We may want to drop these patterns so all Process-Data relationships must be asserted, but for now I kept them because we have system models that rely on these inferences. The important point is that these patterns are not derived from an asserted or inferred Human-Data interactivity relationship.

Cache inference patterns do not infer the need to cache user output, nor user input that should be sent to remove services. The assumption is that if data arrives for display it will be shown immediately, and if user input needs to be sent elsewhere that will be done immediately or not at all.

mike1813 commented 2 weeks ago

The changes now implemented allow the main test cases for this issue to work correctly (see here for the current versions of these test cases):

The last case shows that we still need one more construction pattern to infer that a user output is computed by the interactive process used to view it, where the data does not have any other cause.

Regression tests look OK:

RC-Check Collocated Process Comms 1 and 2 could not be resolved before - so the situation is no worse than before. Actually, it has improved slightly due to the fixes incorporated for issue #134, so that RC-Check Collocated Process Comms 1 does not give a modelling error.

The two 'modified' cases gave implausible (but not impossible) results before, and changes to the model of user-data interactions have not made any difference to this.

These residual ambiguous or inappropriate outcomes are now covered by a separate issue #131 and will not be addressed here.

mike1813 commented 2 weeks ago

An extra inference pattern was added so test case 'Issue 107 Test 1d' now gives good results.

The only remaining consideration is whether this new pattern, and others retained for backward compatibility reasons, may lead to some incorrect deductions if system modeller users forget to include certain assertions in their models. This is now covered by issue #140, so it can be dropped from the list of concerns here.

samuelsenior commented 1 week ago

DataFlow-Test-09s-PlusS-Modified+Fan wasn't in the zip file so I've attached it here: DataFlow-Test-09s-PlusS-Modified+Fan - Asserted.nq.gz

mike1813 commented 1 week ago

This has now been addressed, apart from some aspects associated with remote access (see #106) and using modelling errors to signal an inferred user-process-data interaction (see #140).