Spyderisk / domain-network

Network domain model
Apache License 2.0
1 stars 0 forks source link

Bug in cellular network BaseStation location inference #171

Open mike1813 opened 2 months ago

mike1813 commented 2 months ago

Any Host that has no asserted or inferrable location is inferred by pattern PH-S+W to be in the global public space World. That means inferrable locations must be added before this.

Pattern CNS+RAN+BS-S+l does this for an asserted cellular network Base Station. However, this pattern assumes that a cell could be provided by a virtual eNodeB running on a physical base station, something that might be the case in a 5G network. Consequently, the pattern looks for a host providing a cellular access network with a base station as its physical host.

The problem is that the hasPhysicalHost relationship is inferred by patterns PH+hPH and VHPH+hPH, which come after CNS+RAN+BS-S+l, so CNS+RAN+BS-S+l will never be matched. If a Base Station is asserted to provide a cellular access network, but the location is not specified, then the asserted Base Station will be inferred by PH-S+W to be in the global public space World, and not in the space served by its cellular access network.

mike1813 commented 3 days ago

@samuelsenior : spent a lot of time figuring out how to fix this, and it is worth relaying what I learned.

The problem is that CNS+RAN+BS-S+l precedes PH+hPH and VHPH+hPH, and as a result it can never be matched because the (inferred only) hasPhysicalHost relationship will not exist.

With a construction pattern sequence, the way to fix this would be to change the sequence number of CNS+RAN+BS-S+l to a value above that of either PH+hPH and VHPH+hPH, but still lower than any pattern that should come after CNS+RAN+BS-S+l. To do that is quite difficult, but it can be done by shuffling the order by moving individual patterns or whole sub-sequences around. The challenge is to avoid creating a different anomaly as a side effect.

With construction pattern dependencies, it is easier to change the dependency relationships for CNS+RAN+BS-S+l, leaving csv2nq to sequence the patterns while respecting the unchanged relationships. However, one must avoid creating circular dependencies, which is hard to avoid now the patterrns don't form a single sequence. Moreover, the sequencing dependencies were created from the old sequence, and while we know it has CNS+RAN+BS-S+l in the wrong place, that doesn't mean the errors are necessarily in sequencing dependencies of CNS+RAN+BS-S+l itself.

The current hasPredecessor and hasSuccessor relationships were auto-generated where an element is created in one pattern that would match in another pattern. The direction is then determined from the relative positions of each pattern in an assumed sequence which in the first place was based on the old hasPriority sequence numbers. There are nearly 20,000 dependencies, but many are redundant. A dependency between patterns A and C does not cause a hasPredecessor or hasSuccessor relationship to be created between them if the dependency can be inferred from dependencies between patterns A and B between B and C.

My dependency extraction query generated the following dependencies for CNS+RAN+BS-S+l:

The problems with these auto-generated relationships are as follows:

Examining the dependencies for CNS+RAN+BS-S+l, the following points emerge:

However, simply deleting CNS+RAN+BS-S+l hasSuccessor HuiPDaS-S+l and inverting CNS+RAN+BS-S+l hasSuccessor VHPH+hPH wouldn't work. CNS+RAN+BS-S+l depends on a hasPhysicalHost link from the host providing the RAN to a BaseStation. VHPH+hPH inserts this link if the host providing the RAN is a virtual host provisioned on the BaseStation, so CNS+RAN+BS-S+l hasPredecessor VHPH+hPH would ensure this link is available when CNS+RAN+BS-S+l is executed. However, if the RAN is provided by the physical BaseStation, then the link would be inserted by pattern PH+hPH instead, so we also need CNS+RAN+BS-S+l hasPredecessor PH+hPH.

However, pattern PH-S+W adds a PhysicalHost-locatedIn-World link for Physical Hosts that don't have a location. We need CNS+RAN+BS-S+l to execute before this so a BaseStation with no location gets once based on its cell coverage. This is inferred from the spurious CNS+RAN+BS-S+l hasSuccessor HuiPDaS-S+l, so if that is deleted, the order between CNS+RAN+BS-S+l and PH-S+W would be out of control. This could be fixed by adding CNS+RAN+BS-S+l hasSuccessor PH-S+W.

But PH-S+W precedes PH+hPH which would then also follow CNS+RAN+BS-S+l. But PH+hPH should also precede CNS+RAN+BS-S+l, so we end up with a circular dependency.

The way I figured this out was by using some partial sequence visualisation scripts (which create and process GraphViz files). These can now be run from the Construction Dependencies subform:

The existing (but incorrect) dependency relationships for CNS+RAN+BS-S+l can be viewed in the Construction Pattern Entry form. They come from dependencies involving locatedIn, hasPhysicalHost and accessibleFrom links, so I started by creating those diagrams. The hasPhysicalHost diagram showed that PH+hPH was following CNS+RAN+BS-S+l due to indirect dependencies via DC-C+C and DC-R+RW. Since PH+hPH, DC-C+C and DC-R+RW are all package#Network patterns, I then generated the sequence for this package, which showed that the dependency is from CNS+RAN+BS-S+l via HuiPDaS-S+l, PH-S+W, DC-C+C or DC-R+RW, and SHuPH-Hu+m-Replay1 to PH+hPH.

The diagrams are attached.

We know that CNS+RAN+BS-S+l should precede PH-S+W, so this can only be fixed by breaking the indirect dependency between PH-S+W and PH+hPH. Looking at those predecessor and successor relationships, it is clear that:

So the solution is:

Breaking DC-C+C hasPredecessor PH-S+W and DC-R+RW hasPredecessor PH-S+W removes the circular dependency via CNS+RAN+BS-S+l between PH-S+W and PH+hPH. DC-C+C and DC-R+RW no longer follow CNS+RAN+BS-S+l so it can follow PH+hPH and precede PH-S+W. The only problem is that the broken links may have been the means by which other, valid dependencies were inferred, so these would need to be recreated in some way.

One option is to check more diagrams. For example, comparing the diagrams for dependencies on a 'manages' link and in package#Network reveals that PH+hPH hasPredecessor SHuPH-Hu+m-Replay1 leads to the inference that HprLS+c follows SHuPH-Hu+m-Replay1, which must be added if it is true. (It turns out this one is not valid, for the same reason that PH+hPH need not follow SHuPH-Hu+m-Replay1, because the manages link created by SHuPH-Hu+m-Replay1 is to an optional node in HprLS+c).

Another option is to make some changes and then use the stored queries in the MS Access DB to find any newly unfulfilled dependencies (i.e., dependencies found between patterns that are neither asserted via hasPredecessor or hasSuccessor relationships, nor inferrable from them). That's the option I plan to use, but after refining the queries so they no longer find dependencies based only on optional links.

mike1813 commented 13 hours ago

@samuelsenior : bit more information.

I updated the CP dependency detection queries in the MS Access DB so they don't think there is a dependency if a link created in one pattern would match a link in another pattern but there it is to/from an optional node.

Then I made the seven changes listed above as 'the solution'. Then I generated an assumed sequence based on the predecessor and successor relationships - not the old hasPriority values, and extracted dependencies using the dependency detection queries (i.e., based on CP content). Then I used two stored queries to get asserted dependencies that correspond to dependencies that are no longer found, and used the 'Display Unfulfilled Inferred CP Dependencies' button to get dependencies found that could not be deduced from the predecessor and successor relationships. The relative positions P1 and P2 in the sequence are now based on the asserted dependencies, not the hasPriority values.

The first thing I found was a couple of asserted dependency relationships that are no longer needed:

Before the changes I was getting about 100 unfulfilled dependencies, but after these changes I found over 400. Many of these are redundant, in the sense that they can be deduced from each other in conjunction with asserted CP dependencies. I inspected them in groups based on the types of assets or links that cause each dependency.

A lot of issues stem from CNS+RAN+BS-S+l, but in many cases this seems to be because it is overspecified. The goal is to infer the location of an asserted Base Station if the user failed to specify it. This may happen if the user feels intuitively that the location is obvious because the RAN is only accessible in one Space.

At present, CNS+RAN+BS-S+l includes the Cellular Network of which the RAN is part, and its backbone network. In practice, parts of that may also be inferred so having the backbone in the underlying matching pattern is asking for trouble. Dependencies will arise between this pattern and anything that creates part of an IP network that could act as a backbone.

To achieve the inference goal without creating unnecessary dependencies, we need a 2-pattern sequence:

This would eliminate most dependencies with patterns that create relationships that could match links to/from the CellularNetwork and CoreNetwork nodes in CNS+RAN+BS-S+l:

This leaves four patterns that create relationships would still match links in the new sequence RANafS2+mS, RAN-mS+NS-S+l.

DCpLS+RWS creates a relationship saying a LogicalSubnet provided by a Data Centre is accessibleFrom the Data Centre, and a providedBy relationship to the Data Centre's Router, which should not be a Base Station. These match RAN-accessibleFrom-Space in RANafS2+mS and RAN-mS+NS-S+l, and RAN-providedBy-Host(Gateway) in RAN-mS+NS-S+l. This was resolved before the previous changes because DCpLS+RWS follows DC-R+RW and DC-C+C which followed CNS+RAN+BS-S+l, but that is no longer the case because DC-R+RW and DC-C+C no longer follow PH-S+W.

There are some issues with DCpLS+RWS anyway, arising from the fact that it applies to any LogicalSubnet. This is not appropriate, because a Data Centre can't reasonably provide every type of network. We should restrict it to only L3 subnets (i.e., IP subnets connecting more than one Host). However, that affects backward compatibility so it should be done separately (issue #185).

In the meantime, the simplest option is to

Finally, CtHhPH+hPH, PodHhPH+hPH and their indirect successor PHH+hPH-1 create relationships that would match Host(Gateway)-hasPhysicalHost-BaseStation(PhysicalHost) which will still be in RAN-mS+NS-S+l. These used to follow CNS+RAN+BS-S+l because they follow (indirectly) from DC-R+RW. DC-R+RW used to follow CNS+RAN+BS-S+l, but now doesn't because it no longer follows PH-S+W.

The simplest option is to:

Once again, to figure this stuff out I made extensive use of sequence visualisation to find sequences that have dependencies with CNS+RAN+BS-S+l. Obviously the idea of replacing CNS+RAN+BS-S+l came from thinking about whether those dependencies make sense, which led to the realisation that some don't make sense but arise because that pattern is overspecific.