Bug in cellular network BaseStation location inference

@samuelsenior : spent a lot of time figuring out how to fix this, and it is worth relaying what I learned.

The problem is that CNS+RAN+BS-S+l precedes PH+hPH and VHPH+hPH, and as a result it can never be matched because the (inferred only) hasPhysicalHost relationship will not exist.

With a construction pattern sequence, the way to fix this would be to change the sequence number of CNS+RAN+BS-S+l to a value above that of either PH+hPH and VHPH+hPH, but still lower than any pattern that should come after CNS+RAN+BS-S+l. To do that is quite difficult, but it can be done by shuffling the order by moving individual patterns or whole sub-sequences around. The challenge is to avoid creating a different anomaly as a side effect.

With construction pattern dependencies, it is easier to change the dependency relationships for CNS+RAN+BS-S+l, leaving csv2nq to sequence the patterns while respecting the unchanged relationships. However, one must avoid creating circular dependencies, which is hard to avoid now the patterrns don't form a single sequence. Moreover, the sequencing dependencies were created from the old sequence, and while we know it has CNS+RAN+BS-S+l in the wrong place, that doesn't mean the errors are necessarily in sequencing dependencies of CNS+RAN+BS-S+l itself.

The current hasPredecessor and hasSuccessor relationships were auto-generated where an element is created in one pattern that would match in another pattern. The direction is then determined from the relative positions of each pattern in an assumed sequence which in the first place was based on the old hasPriority sequence numbers. There are nearly 20,000 dependencies, but many are redundant. A dependency between patterns A and C does not cause a hasPredecessor or hasSuccessor relationship to be created between them if the dependency can be inferred from dependencies between patterns A and B between B and C.

My dependency extraction query generated the following dependencies for CNS+RAN+BS-S+l:

CNS+RAN+BS-S+l hasPredecessor DCH+l: because DCH+I creates a locatedIn relation between an existing PhysicalHost and DataCentre, which would match the prohibited BaseStation-locatedIn-Space (role OtherSpace) in CNS+RAN+BS-S+l.
CNS+RAN+BS-S+l hasSuccessor HuiPDaS-S+l: because HuiPDaS-S+l creates a locatedIn relation from an existing PhysicalHost to an existing Space, which would match the prohibited BaseStation-locatedIn-Space (role OtherSpace) in CNS+RAN+BS-S+l.
CNS+RAN+BS-S+l hasSuccessor HuiPDaS-S+l: because HuiPDaS-S+l specifies a locatedIn relation from a PhysicalHost to a prohibited Space which would match the BaseStation-locatedIn-Space created by CNS+RAN+BS-S+l.
CNS+RAN+BS-S+l hasSuccessor VHPH+hPH: because VHPH+hPH creates a hasPhysicalHost link from an existing VirtualHost to an existing PhysicalHost, which would match the Host(Gateway)-hasPhysicalHost-BaseStation in CNS+RAN+BS-S+l.
PCN+W hasPredecessor CNS+RAN+BS-S+l: because PCN+W creates an accessibleFrom relationship from an existing PublicCellularNetwork to a newly created World, which would match the CellularNetwork-accessibleFrom-Space in CNS+RAN+BS-S+l.

The problems with these auto-generated relationships are as follows:

The dependency may not be real, but detected as such because the relevant MS Access DB queries are deficient. One known issue is that if a pattern has an optional node, links with that node are also optional but this is not taken into account. More complex cases involve nodes and links in each pattern that lead to mutually exclusive matching requirements ensuring the constructed element would never be matched.
The dependency may arise only because the element is under-specified in the pattern where it is matched, referring to a base type that is the same as the constructed asset/relationship or an ancestor type, when really it should specify a more restrictive subtype.
If the dependency is real, the direction (predecessor or successor) is based on the original sequence, so if that is wrong (as in this case), then the auto-generated dependency relationships will be wrong.

Examining the dependencies for CNS+RAN+BS-S+l, the following points emerge:

CNS+RAN+BS-S+l hasPredecessor DCH+l is correct. A BaseStation wouldn't normally be resourcedBy a DataCentre, but there is nothing in the domain model to say it couldn't happen. If a user specified this, then DCH+l would need to fire first to ensure CNS+RAN+BS-S+l doesn't infer the BaseStation location from its cell coverage (which might not be the same space as the DataCentre itself).
CNS+RAN+BS-S+l hasSuccessor HuiPDaS-S+l is spurious. The PhysicalHost in HuiPDaS-S+l should be a PersonalHost, and a BaseStation is not a PersonalHost, so this dependency should not exist.
CNS+RAN+BS-S+l hasSuccessor VHPH+hPH is the wrong way around. CNS+RAN+BS-S+l won't match if the host providing the RAN is a virtual host unless the hasPhysicalHost link is first created by VHPH+hPH, so it should be CNS+RAN+BS-S+l hasPredecessor VHPH+hPH.
PCN+W hasPredecessor CNS+RAN+BS-S+l is correct in principle. The World can't really be a cell in a cellular network, so we don't want the PublicCellularNetwork-accessibleFrom-World relationship created by PCN+W to be matched in CNS+RAN+BS-S+l.

However, simply deleting CNS+RAN+BS-S+l hasSuccessor HuiPDaS-S+l and inverting CNS+RAN+BS-S+l hasSuccessor VHPH+hPH wouldn't work. CNS+RAN+BS-S+l depends on a hasPhysicalHost link from the host providing the RAN to a BaseStation. VHPH+hPH inserts this link if the host providing the RAN is a virtual host provisioned on the BaseStation, so CNS+RAN+BS-S+l hasPredecessor VHPH+hPH would ensure this link is available when CNS+RAN+BS-S+l is executed. However, if the RAN is provided by the physical BaseStation, then the link would be inserted by pattern PH+hPH instead, so we also need CNS+RAN+BS-S+l hasPredecessor PH+hPH.

However, pattern PH-S+W adds a PhysicalHost-locatedIn-World link for Physical Hosts that don't have a location. We need CNS+RAN+BS-S+l to execute before this so a BaseStation with no location gets once based on its cell coverage. This is inferred from the spurious CNS+RAN+BS-S+l hasSuccessor HuiPDaS-S+l, so if that is deleted, the order between CNS+RAN+BS-S+l and PH-S+W would be out of control. This could be fixed by adding CNS+RAN+BS-S+l hasSuccessor PH-S+W.

But PH-S+W precedes PH+hPH which would then also follow CNS+RAN+BS-S+l. But PH+hPH should also precede CNS+RAN+BS-S+l, so we end up with a circular dependency.

The way I figured this out was by using some partial sequence visualisation scripts (which create and process GraphViz files). These can now be run from the Construction Dependencies subform:

The 'Visualise CP Dependencies by Package' button leads to a subform allowing you to select a package and generate a diagram for patterns in that package, including dependency relationships to patterns in packages on which the selected package depends. There is an option to also show dependency relationships from patterns in packages that depend on the selected package.
The 'Visualise CP Dependencies by Element' button leads to a subform allowing you to select an asset or relationship type and generate the diagram for patterns in which this asset or relationship type is created or matched or both, regardless of package membership. In this case, the indirect dependencies are also shown as fainter (grey) lines linking those patterns.

The existing (but incorrect) dependency relationships for CNS+RAN+BS-S+l can be viewed in the Construction Pattern Entry form. They come from dependencies involving locatedIn, hasPhysicalHost and accessibleFrom links, so I started by creating those diagrams. The hasPhysicalHost diagram showed that PH+hPH was following CNS+RAN+BS-S+l due to indirect dependencies via DC-C+C and DC-R+RW. Since PH+hPH, DC-C+C and DC-R+RW are all package#Network patterns, I then generated the sequence for this package, which showed that the dependency is from CNS+RAN+BS-S+l via HuiPDaS-S+l, PH-S+W, DC-C+C or DC-R+RW, and SHuPH-Hu+m-Replay1 to PH+hPH.

The diagrams are attached.

We know that CNS+RAN+BS-S+l should precede PH-S+W, so this can only be fixed by breaking the indirect dependency between PH-S+W and PH+hPH. Looking at those predecessor and successor relationships, it is clear that:

DC-C+C hasPredecessor PH-S+W and DC-R+RW hasPredecessor PH-S+W are both spurious. DC-C+C and DC-R+RW create physical hosts with a location, so they could never match the physical host in PH-S+W which must have no location.
SHuPH-Hu+m-Replay1 hasPredecessor DC-C+C and SHuPH-Hu+m-Replay1 hasPredecessor DC-R+RW are both valid. DC-C+C and DC-R+RW create physical hosts in a DataCentre, which should match the FixedHost-locatedIn-Space in SHuPH-Hu+m-Replay1.
PH+hPH hasPredecessor SHuPH-Hu+m-Replay1 is spurious. SHuPH-Hu+m-Replay1 creates a 'manages' relationship which looks like it is matched in PH+hPH. However, in PH+hPH the other end is an optional node, so the relationship is also optional and it doesn't matter if it is generated by SHuPH-Hu+m-Replay1 before or after PH+hPH.
The dependencies inferred between PH+hPH via SHuPH-Hu+m-Replay1 and both DC-C+C and DC-R+RW are valid. If the spurious PH+hPH hasPredecessor SHuPH-Hu+m-Replay1 is removed, we must add PH+hPH hasPredecessor DC-C+C and PH+hPH hasPredecessor DC-R+RW.

So the solution is:

Replace CNS+RAN+BS-S+l hasSuccessor VHPH+hPH with CNS+RAN+BS-S+l hasPredecessor VHPH+hPH, so this dependency is the right way round.
Add CNS+RAN+BS-S+l hasPredecessor PH+hPH, so the same dependency works if the host providing the RAN is physical and not virtual.
Delete CNS+RAN+BS-S+l hasSuccessor HuiPDaS-S+l, and change the physical host in HuiPDaS-S+l to a Personal Host so it no longer seems like this dependency should exist.
Add CNS+RAN+BS-S+l hasSuccessor PH-S+W to ensure that PH-S+W still follows CNS+RAN+BS-S+l, just not via HuiPDaS-S+l.
Negate DC-C+C hasPredecessor PH-S+W and DC-R+RW hasPredecessor PH-S+W. These dependencies are found because of deficiencies in the detection queries, so it may be best to mark them as 'fake' and add an explanation, rather than simply deleting them.
Negate PH+hPH hasPredecessor SHuPH-Hu+m-Replay1 in the same way, or delete it if the query can be modified so spurious dependencies involving links to/from optional nodes are ignored.
Add PH+hPH hasPredecessor DC-C+C and PH+hPH hasPredecessor DC-R+RW, so PH+hPH still follows DC-C+C and DC-R+RW, just not via SHuPH-Hu+m-Replay1.

Breaking DC-C+C hasPredecessor PH-S+W and DC-R+RW hasPredecessor PH-S+W removes the circular dependency via CNS+RAN+BS-S+l between PH-S+W and PH+hPH. DC-C+C and DC-R+RW no longer follow CNS+RAN+BS-S+l so it can follow PH+hPH and precede PH-S+W. The only problem is that the broken links may have been the means by which other, valid dependencies were inferred, so these would need to be recreated in some way.

One option is to check more diagrams. For example, comparing the diagrams for dependencies on a 'manages' link and in package#Network reveals that PH+hPH hasPredecessor SHuPH-Hu+m-Replay1 leads to the inference that HprLS+c follows SHuPH-Hu+m-Replay1, which must be added if it is true. (It turns out this one is not valid, for the same reason that PH+hPH need not follow SHuPH-Hu+m-Replay1, because the manages link created by SHuPH-Hu+m-Replay1 is to an optional node in HprLS+c).

cp_element_manages.pdf

Another option is to make some changes and then use the stored queries in the MS Access DB to find any newly unfulfilled dependencies (i.e., dependencies found between patterns that are neither asserted via hasPredecessor or hasSuccessor relationships, nor inferrable from them). That's the option I plan to use, but after refining the queries so they no longer find dependencies based only on optional links.

@samuelsenior : bit more information.

I updated the CP dependency detection queries in the MS Access DB so they don't think there is a dependency if a link created in one pattern would match a link in another pattern but there it is to/from an optional node.

Then I made the seven changes listed above as 'the solution'. Then I generated an assumed sequence based on the predecessor and successor relationships - not the old hasPriority values, and extracted dependencies using the dependency detection queries (i.e., based on CP content). Then I used two stored queries to get asserted dependencies that correspond to dependencies that are no longer found, and used the 'Display Unfulfilled Inferred CP Dependencies' button to get dependencies found that could not be deduced from the predecessor and successor relationships. The relative positions P1 and P2 in the sequence are now based on the asserted dependencies, not the hasPriority values.

The first thing I found was a couple of asserted dependency relationships that are no longer needed:

HuiPDaS-S+l hasPredecessor SHuPH-Hu+m is no longer needed because the locatedIn relationship created in SHuPH-Hu+m is from a Fixed Host, but the one matched in HuiPDaS-S+l is now from a Personal Host, none of which are fixed.
HumHVH-Hu+m hasSuccessor HprLS+c is no longer needed because the manages relationship created in HumHVH-Hu+m matches an optional link in HprLS+c, so the dependency was fake and is now not found by the DB queries

Before the changes I was getting about 100 unfulfilled dependencies, but after these changes I found over 400. Many of these are redundant, in the sense that they can be deduced from each other in conjunction with asserted CP dependencies. I inspected them in groups based on the types of assets or links that cause each dependency.

A lot of issues stem from CNS+RAN+BS-S+l, but in many cases this seems to be because it is overspecified. The goal is to infer the location of an asserted Base Station if the user failed to specify it. This may happen if the user feels intuitively that the location is obvious because the RAN is only accessible in one Space.

At present, CNS+RAN+BS-S+l includes the Cellular Network of which the RAN is part, and its backbone network. In practice, parts of that may also be inferred so having the backbone in the underlying matching pattern is asking for trouble. Dependencies will arise between this pattern and anything that creates part of an IP network that could act as a backbone.

To achieve the inference goal without creating unnecessary dependencies, we need a 2-pattern sequence:

[ ] a new pattern RANafS2+mS that checks if a RAN covers more than one space, and adds a self-referential 'mS' link in that case
[ ] a new pattern RAN-mS+NS-S+l which hasPredecessor RANafS2+mS, created by removing the CellularNetwork and CoreNetwork nodes from CNS+RAN+BS-S+l, and adding a prohibited self-referential 'mS' link

This would eliminate most dependencies with patterns that create relationships that could match links to/from the CellularNetwork and CoreNetwork nodes in CNS+RAN+BS-S+l:

patterns CNCNBSS+aF, CNRANS+aF, FCNS+aF and MPCNS+aF, which create accessibleFrom relationships that may match CellularNetwork-accessibleFrom-Space
pattern CNR-CN+CN, which creates an implementsCN relationship that could match L3Subnet(CoreNetwork)-implementsCN-CellularNetwork
patterns DCMW-LS+VX, MWHLS+CIP and MWPodLS2+c, a sequence creating subnets in an inferred K8S set up, including relationships that could match Host(Gateway)-connectedTo-L3Subnet(CoreNetwork)
patterns DCCVH-LS+LS, HVHS+VS, a sequence that creates a virtual subnet for virtual hosts running on a cluster or a non-cluster respectively, including relationships that may match Host(Gateway)-connectedTo-L3Subnet(CoreNetwork)
patterns DCWH-LS+c and DCLS+RWS, a sequence that creates subnets between fixed hosts in a Data Centre and between the Data Centre and other subnets, including relationships that could match Host(Gateway)-connectedTo-L3Subnet(CoreNetwork)
pattern HcWHS+c, which creates connections to WiFi hotspots that could match Host(Gateway)-connectedTo-L3Subnet(CoreNetwork)

This leaves four patterns that create relationships would still match links in the new sequence RANafS2+mS, RAN-mS+NS-S+l.

DCpLS+RWS creates a relationship saying a LogicalSubnet provided by a Data Centre is accessibleFrom the Data Centre, and a providedBy relationship to the Data Centre's Router, which should not be a Base Station. These match RAN-accessibleFrom-Space in RANafS2+mS and RAN-mS+NS-S+l, and RAN-providedBy-Host(Gateway) in RAN-mS+NS-S+l. This was resolved before the previous changes because DCpLS+RWS follows DC-R+RW and DC-C+C which followed CNS+RAN+BS-S+l, but that is no longer the case because DC-R+RW and DC-C+C no longer follow PH-S+W.

There are some issues with DCpLS+RWS anyway, arising from the fact that it applies to any LogicalSubnet. This is not appropriate, because a Data Centre can't reasonably provide every type of network. We should restrict it to only L3 subnets (i.e., IP subnets connecting more than one Host). However, that affects backward compatibility so it should be done separately (issue #185).

In the meantime, the simplest option is to

[ ] make DCpLS+RWS a direct successor of the new RAN-mS+NS-S+l, so it cannot interfere with RANafS2+mS and RAN-mS+NS-S+l.

Finally, CtHhPH+hPH, PodHhPH+hPH and their indirect successor PHH+hPH-1 create relationships that would match Host(Gateway)-hasPhysicalHost-BaseStation(PhysicalHost) which will still be in RAN-mS+NS-S+l. These used to follow CNS+RAN+BS-S+l because they follow (indirectly) from DC-R+RW. DC-R+RW used to follow CNS+RAN+BS-S+l, but now doesn't because it no longer follows PH-S+W.

The simplest option is to:

[ ] make PodHhPH+hPH a direct successor of the new RAN-mS+NS-S+l, so it and the two subsequent patterns cannot interfere with RANafS2+mS and RAN-mS+NS-S+l.

Once again, to figure this stuff out I made extensive use of sequence visualisation to find sequences that have dependencies with CNS+RAN+BS-S+l. Obviously the idea of replacing CNS+RAN+BS-S+l came from thinking about whether those dependencies make sense, which led to the realisation that some don't make sense but arise because that pattern is overspecific.

Spyderisk / domain-network

Bug in cellular network BaseStation location inference #171