Construction pattern refactoring

mike1813 commented 1 week ago

As discussed in issue #169, to improve modularity it makes sense to replace the explicit construction sequence by a set of construction pattern dependencies, leaving csv2nq to compute a partial ordering of construction patterns that can be inserted into the NQ file used by system modeller.

One side-effect of doing this is that the explicit sequence number stored for each construction pattern in field hasPriority, table ConstructionPattern.csv is no longer meaningful. It makes sense to change the canonical ordering of ConstructionPattern.csv, so this is based on the package membership, then the URI of each pattern. Doing this would mean concurrent changes to different packages can in future be more easily handled by git merge, and make it possible to apply filtering in csv2nq, such that optional packages included in the domain model source code can be excluded from the NQ file.

The only downside is that it isn't possible to use git merge to combine changes if one set of changes is on a branch that uses the old canonical ordering and the other uses the new canonical ordering.

To fix this, we should freeze development of all branches briefly, and make one further change to the head of each active branch that still uses the old ordering. This change should change the ordering used in table ConstructionPattern.csv, and make no other changes. One slight issue is that the MS Access DB domain model editor uses a native MS Access sort order, which is not guaranteed to be the same as the native MS Excel sort order. So the precise definition of this sort order should also be specified somewhere.

The branches that need this treatment are as follows:

[ ] branch 6a: currently the main 'dev' branch in this repository
[ ] branch 132: contains an optional 'patient harms' model
[ ] branch 170: contains an optional 'harms arising in AI (specifically ML) systems.
[ ] branch 176: has construction pattern dependencies inherited from branch 169, plus an extensions to the network connectivity and application models needed by project TELEMETRY.

Branch is 176 was created from branch 169, in which the reordering of ConstructionPattern.csv has already been done. However, the new canonical ordering may be different from this due to the adoption of a specified sort order in place of the MS Access native sort order, so a new update should also be made here:

Once this has been done, it would be possible to merge branches 132, 170 and 176 into branch 6a, or (if preferred) a new 'dev' branch that uses the new canonical ordering in all the CSV tables. The extensions in branch 176 are not optional (all are changes to the base domain model), so this needs doing anyway. The extensions in branches 132 and 170 are considered optional at present, but once we have the new canonical ordering and csv2nq filtering capability, there is no disadvantage to having them all in the same source tree.

mike1813 commented 1 week ago

Did some experiments to check how the MS Access DB sorts string-valued fields (like those in the 'package' and 'URI' fields we use most often for this purpose). It turns out that by default, MS Access ignores the case of alphabetic characters, and pretty much any other non-alphabetic but printable character in the basic set comes before the alphabetic characters.

This gives a different outcome from almost any other lexicographic sort function, so we need to change it.

The new convention is that strings are sorted on their characters in the order they appear in the string, and the rank of each character is the corresponding ASCII or Unicode value by which it is represented. This means that:

sorting is case sensitive, with all upper case letters ranked before any lower case letters
digits come before upper case letters
punctuation characters are in four blocks: before digits, between digits and upper case letters, between upper and lower case letters, and after lower case letters
the space character is before all other characters, but it is ignored if it appears at the end of a string (i.e., space characters are used to rank the string only if they are at the start or in the middle of other, printable characters).

The MS Access DB script to export tables has now been altered so it complies with this specification, although it does mean that the proposed changes will now affect the ordering of more tables than we previously thought, where case sensitive sorting leads to a different row order than case insensitive sorting.

mike1813 commented 6 days ago

I created the branches, but mistakenly associated them with issue 180.

mike1813 commented 6 days ago

In branch 176 (actually in branch 169), packages that contain only construction patterns have been merged with the corresponding packages that create threats and other stuff. This is necessary because construction-only packages are on different branches of the package dependency hierarchy, so it is not possible for one construction pattern to have a property referring to another from a different package.

The problem is that the canonical ordering for most CSV tables is based on the package URI before anything else. This means the reordering in branches 6a, 132 and 170 must also involve merging construction-only packages with the packages containing related threats and other stuff.

mike1813 commented 6 days ago

There are also two potentially significant change in the package dependency hierarchy in branch 176.

The first change is that in other branches, package#Network depends on package#Core, whereas in branch 176, package#Users depends on package#Core. The dependence of package#Network on package#Core was asserted at a time when package#Network formed the bottom of the trunk in the package dependency tree, so making it depend on package#Core implied all other packages also depend on package#Core. However, at some point package#Network was made to depend on package#AccessContext and by implication also package#Physical and package#Users. Moving these packages below package#Network meant their dependence on package#Core was no longer implied by their asserted (direct) dependencies. To fix this, the dependence on package#Core was moved from package#Network to package #Users.

This change should obviously have been made in other branches, so it will be included in the reordering updates. This also means that the package rankings should be recalculated. That was not done in branch 176, so that change will now be added in branch 176.

The second change is that in other branches, package#5G depends on package#NetworkConnectivity, whereas in branch 176, package#5G depends on package#Virtualisation. This change was needed because there are dependencies between construction patterns in package#5G and package#Virtualisation. Having them on separate branches of the package hierarchy meant it was not possible to add properties to the construction patterns in either package which referred to patterns in the other package.

There are no dependencies of patterns in either package on assets, roles or relationships from the other package, so this change is not needed until CP dependencies are added. For that reason it will not be included in the other branches.

Spyderisk / domain-network

Construction pattern refactoring #183