Spyderisk / domain-network

Network domain model
Apache License 2.0
1 stars 0 forks source link

Bug in Cache Generation #109

Closed mike1813 closed 2 weeks ago

mike1813 commented 5 months ago

When a Data Flow to a Process cannot be used immediately, e.g., because it is one of several inputs to a service, or because it arrives in a context where the process cannot run, the data flow must be cached. This is addressed by a construction pattern sequence.

The current sequence contains a couple of bugs that should be fixed:

Best to address these at the same time, because the solution to one is likely to affect how to solve the other.

mike1813 commented 3 months ago

There is also an issue with the surfacing threat D.A.DallDS.0, which causes Loss of Availability at a Data asset (representing a type of data) if all copies of the data are unavailable. This threat is suppressed if there is an uncompromised cached data copy, which should not be the case. See #120.

mike1813 commented 3 months ago

In these scenarios, the underlying assumption is that when a Process runs, it will try to use all relevant inputs to create all potentially relevant outputs.

In what contexts (i.e., in which physical location) could the Process run? The working assumption is that the Process may run in a context if:

An input is relevant if it is (a) sent by the initiating client or user, (b) obtained from a service, (c) stored on the Process host or (d) necessary for the Process to run. An output is relevant if it is (a) returned to the initiating client or user or (b) sent to a service.

Some variation in the Process execution is therefore assumed. The calculation can run with a subset of non-essential inputs, but uses everything it needs or can access when the calculation is triggered. It can produce a subset of outputs according to the demands of the situation, but can only skip outputs that could not possibly be relevant in that situation. If the Process can take different paths each generating a subset of the outputs and/or using a subset of its inputs, where these subsets cannot be deduced from the situation, then it should be modelled as several Processes.

The calculation of an output by a Process is not possible in a context if necessary inputs are not available in that context. In that situation, the Process can either delay the calculation or drop it.

One must decide in which contexts the Process may be unable to send output to a service. This was handled incorrectly in the previous implementation. It is not necessary that the client be unable to connect to a subnet over which it can send a request. It is sufficient if the service may be unable to receive that request in any location where it may be. This needs to be fixed, as discussed in issue #121.

An input is necessary if either:

In the second case, the data is strictly not a Process output and it need not be a Process input, but it is treated as both. This case may arise when the Process only serves or relays the data, which implies that distinct inference rules will be needed. For that reason, this possibility should be treated separately in caching inference patterns.

It is assumed that if the Process is initiated by an interactive user, calculations will be performed only in contexts where all the necessary inputs can be accessed and all outputs can be delivered. If this is not the case, the user interface will display a 'try again later' error. In other words, we assume that rather than delaying a calculation until inputs are available, the Process will pass the delay back to the interactive user.

This means no caching is inferred arising from user interactions, but also that the system model is inconsistent if there are no contexts accessible to the interactive user where the Process can access all necessary inputs and deliver all relevant outputs to services.

If a client sends a request to the Process causing it to run in some context where some necessary inputs are not accessible, it is assumed that the calculation is delayed until these inputs can be obtained. With these assumptions, it can be inferred that:

These three cases apply if the Process serves, relays or processes the data. The remaining cases do not arise if it only serves or relays the data:

Note that case (4) ensures that necessary input from a client will be saved if other clients use the Process as a service, so the necessary input in case (5) must be obtained from a service. This in turn means the unsaved inputs in case (6) must also come from services.

Finally, what about outputs (other than forwarded data)? If inputs are cached as above, then output can be calculated when it may be possible to send it. Output for a client can be created and sent when the client reconnects, but output sent to a service may still need to be cached:

This last case arises simply because if the host of the service where the output should go is mobile, it is possible that in some locations it would not be accessible from the Process. The Process as a client can choose when to try to connect, and can delay generating output until it is ready to connect. However, it cannot know in advance whether the output can be sent if there is a possibility that the service could move out of range.

With these assumptions, if there is a Process with a necessary input that cannot be accessed in any context, then the system model is inconsistent. Thus we also need two modelling errors:

mike1813 commented 3 months ago

Case 6 is not quite addressed correctly. The inference rules as implemented in branch 40 now finds two distinct, necessary inputs not stored on the Process host each coming from a Service, and assumes the first input is cached if there is no context where the Process could get input from one and be sure of also getting input from the other.

The idea is that in this situation, when the Process starts the calculation it tries to get both inputs. Whichever service becomes accessible first is the first service in the construction pattern, from which the first input is obtained. But if there is a possibility of the second service being out of range, it may be that access to the second input is blocked. In that case, the calculation must be delayed until the second service returns to a location where it can be accessed, during which time the first input must be stored.

The logic is not quite right as the second input may be obtainable from some other service, so finding one source to be out of range doesn't mean the second input will be unavailable. To express this, one would need to insert access interruption links to Data Flows and Data, not just client-service channels. At this point that hasn't been done because:

The pattern used means caching may be inferred when not needed. Genuine threats via cached data will not be missed, but some spurious extra threats may be added. This is compatible with the principle that any lack of fidelilty should lead to risks being overestimated rather than underestimated. On that basis, using a simpler pattern so other updates can be integrated sooner is acceptable.

However, the same simplification does make it difficult to create modelling error threats (Error 1 and Error 2). For now, we will need to make do without these modelling error threats. However, this dangling issue will need to be fixed at some point.

mike1813 commented 3 months ago

The changes have fixed problems in my current test cases, so I pushed the changes to branch 40.

I can't easily create test cases for every scenario because of the issues with the representation and inference patterns for interactive user access to data, as described in #107. Plan is to create a temporary fix for some of those issues on branch 40, so I can create a few more tests for the cache construction sequence.

After that it would make sense to merge changes from branch 40 into branch 6a.

mike1813 commented 3 months ago

Created a new set of tests covering cases 1 to 6 above. All take the form of three processes with two uses relationships, in which the middle process is the one that may need to cache data.

In each case there is one or sometimes two scenarios in which the cache should be inferred to exist, with names Case 1a, 1b, etc. There is also one scenario where the data cache should not be inferred, with names like Case 1x, Case 3x, etc. There is no Case 2x because that would be identical to Case 1x.

The asserted system models for these tests are in this zipfile: Issue 109 Tests.zip.

The results are as we would expect - a cache of data D1 or D2 is inferred to exist (or not) on the mobile host N1.

In Cases 2a, 5a and 5x the middle process is a service on a mobile device getting data from a client that cannot access the service unless it is in a specific location. These three cases also exercise case 7 above, and in each case a cache of D1 or D2 is also inferred to exist on the client host H1.

This is a little confusing, because in some of those cases, the same data is cached on N1, but formally it is correct.

The zipfile contains one further test case, based on a reasonably common IoT scenario, in which a user wears an activity sensor connected via Bluetooth to their phone. The user also has an application running on their PC in which they log their meals. That data is saved to disk, and served by a simple data service. An app on the phone receives data from the sensor and uses it along with the meal data to create a diet plan, which it displays to the user and stores via the same data service on the user's PC.

The PC remains at home, but the phone and sensor may be carried by the user when they go out. This means the phone app:

The model once validated contains one data cache, for the sensor data which is stored by the app on the phone. This is needed because the app cannot run until it can access meal data, so it must cache the otherwise unsaved flow of input from the sensor. The app generates its output (the diet plan) when the user gets home and the phone can access their meal data.

There is no need to cache the meal data on the phone because the phone uses it immediately (having cached the sensor data it needs as the second input). There is no need to cache the diet plan because the phone cannot save that when the user is out, so it shouldn't generate output when the user is not home (not only because it can't access the meal data). It should generate the output when it is in range of the data service (i.e., when the user is home), and because the data service runs on a fixed host in that location, there is no possibility that the data service could then be out of range.

Note that the data service could be down, leading to loss of availability in the flows of data to and from the app, but this loss of availability is a deviation from the expected behaviour, so it is assumed not to cause the app to cache the data.

mike1813 commented 2 months ago

Problems arising from #107 that prevented good tests against issue #109 have now been addressed. This was achieved by making a partial fix for #107, and reformulation of the test cases to avoid problems not covered by this partial fix.

In addition, the threat to persistent data availability now ignores (non-persistent) cached copies of data flows, addressing #120. To do this involved fixes also for #123 and #124.

All these fixes are now on branch 40, so a pull request can now be raised addressing this issue.

mike1813 commented 3 weeks ago

Work on issue #107 revealed one more possible caching scenario, where uncached input is sent to a service causing it to execute in some context, and the service produces output destined for a second service that cannot be accessed in that context. This appeared in a test case for issue #107 - the IoT scenario discussed above Issue 107 Test 1a - Asserted.nq.gz.

In this case, a sensor acts as a client sending input to the DietPlanner which runs on a phone with which the sensor is paired, and generates updated diet plans considering the user's activity levels. The DietPlanner also uses input data Meals, which must be fetched from a service running on a PC accessible from only one location. Consequently, the sensor input should be cached and used when the DietPlanner is next within range of the PC. The output data DietPlan is stored via the same service running on the PC . It should not be cached because it is only generated when the DietPlanner is in range of the PC.

In practice, problems in the model for user interactions with data (as described in issue #107) cause these inferences to fail. Input data 'Meals' is inferred to come from an interactive user - which is true, but only via a separate process running on the PC. Due to the shortcomings in the model of user-data interactions, it is inferred that Meals is input via the DietPlanner process, so it doesn't need to wait until it can fetch this data from the service running on the user's PC.

This has two consequences. First, the need to cache the sensor input is missed - something that can only be fixed by addressing issue #107. Second, if the DietPlan could be created in any location, it should be cached until the user comes within range of the storage service on their PC. Changes to address #109 mean this is no longer inferred. While the need to cache output only arises because of the failure to detect that an input should be cached, the output should be cached if it can't be sent in a context where input from a client is not cached.

There are two ways this omission could be resolved:

  1. Ensure input received from a client is cached if if arrives in a context where output for a service cannot be sent. The assumption in this case is that a process would not generate output it couldn't send, so the calculation would be delayed which means caching the inputs from which it is computed.
  2. Ensure output sent to a service is cached if the process may be forced to perform calculations by receipt of uncached input from a client in a context where the output cannot be sent.

The first solution seems more realistic in this test case, though that may be because we know processing should be delayed due to the presence of a second, necessary input (Meals). If this were removed from the model, it is less clear that the input would always be cached and processing delayed. If we implement the first solution, we're excluding any possibility that an output may be cached.

The second solution also works in this test case, given that the sensor input is assumed to be uncached input, even though in this test case, the input should have been cached. It seems less realistic, but if the second input (Meals) is removed from the model, we may find in some cases it is the output that should have been be cached. The second solution allows users to choose which way to go. If the presence of an input cache is not inferred, the user can assert that the data is stored (or cached) on the process host. To make this work, we need two changes:

This situation may be thought of as a new case 8, although the new construction patterns should be inserted before the one that infers the presence of an output cache due to case 7.

mike1813 commented 3 weeks ago

Now fixed on branch 107, allowing previously failing tests for user interactivity to be used addressing issue #107.

mike1813 commented 2 weeks ago

We don't yet have the two modelling error threats as discussed above, Error 1 and Error 2.

These are difficult to implement, as they require threat patterns that match a condition not being met. They are now moved into a separate issue (issue #138), so this one can be closed.