Spyderisk / domain-network

Network domain model
Apache License 2.0
1 stars 0 forks source link

Bug in data lifecycle control inference #149

Closed mike1813 closed 1 day ago

mike1813 commented 6 days ago

If a Process serves a locally stored copy of a Data asset (i.e., a Data Copy asset), we get a Stored Data Pool asset associated with the Data and Process, and a Process-enablesAccess-StoredDataPool, meaning it controls the data access. See construction pattern PsDSH-DP+DP.

If other Processes access the same Data, we get DataInput or DataOutput assets associated with their data access. The enablesAccess from the serving Process is then propagated (possibly via other DataAccess assets and communication intermediaries) to any DataInput and DataOutput assets by construction patterns DUDA-eS+eA and DADU-eS+eA.

The presence of an enablesAccess relationship thereby specifies that there is a process enabling access to the serialized data. This is used in two ways:

Things get a bit tricky in cases where either (a) there is no explicit Process-serves-Data relationship, or (b) there is no stored copy of the Data. In the former case, the construction sequence looks for a process accessing the stored copy and makes it responsible for enabling access. This covers situations where a Process accesses stored data as input or creates stored data as output, but may also send the data to another (remote) data consumer. In the latter case, the process creating the data is considered responsible for enabling access by any consumer process, since the creator must be sending the data in messages to the consumer, rather than via a stored copy.

There is a bug in the current sequence whereby no process is inferred to be responsible for enabling access if the only stored copy is used by a Process that uses the data as input, and the data is created by a second process. In this case, the process using data as input is the enabler, since it manages access to the stored copy, and determines whether output from the second process should be stored. However, this combination is not picked up correctly in the current sequence.

mike1813 commented 6 days ago

The problem is in construction pattern DSDPS+DC, which is supposed to create an initial data channel from an output (data source) that is then iteratively extended until a data destination is reached. The matching pattern contains a spurious node, which may not always be present, causing the pattern not to be matched and the associated data channel (and other data channels obtained by iterative extension) to be missed.

Interestingly, the spurious node does not appear in the slide set documenting the construction sequence. It looks like a change was planned but somehow never got implemented.

It is a simple fix to remove the spurious node. The main challenge is to check that the extra channels don't cause problems in cases where the existing channels are sufficient. We have a lot of test cases developed for issues #40, #109, etc., but (a) there are lots of tests covering different (sometimes corner) cases, and (b) there may be corner cases not covered by those tests. It will take time to run the tests, check for (and if necessary fix) any regression issues, and confirm that the tests cover all the required cases.

mike1813 commented 3 days ago

A reasonable number of tests have now been used.

Conclusion. These tests do show that the fix to DSDPS+DC has corrected some errors, and in cases where those errors don't arise, the changes have not altered the outcome. The change should now be merged into branch 6a.

mike1813 commented 1 day ago

Updated on branch 149 and merged into 6a.