Spyderisk / system-modeller

Spyderisk web service and web client
Other
4 stars 4 forks source link

Unit tests for JenaQuerierDB are needed #141

Open mike1813 opened 8 months ago

mike1813 commented 8 months ago

What is needed

At present, unit tests exist for the SystemModelQuerier and SystemModelUpdater classes, which read and write directly to the Jena triple store by constructing and executing SPARQL queries. However, there are no unit tests for the JenaQuerierDB, which interacts with the triple store via the Jena API, uses a set of deserialization classes for system (and domain) model entities, and supports caching of these objects to avoid the overhead of frequent access to the triple store.

In almost all situations, JenaQuerierDB instances are used with the caching feature enabled. This makes it quite difficult to test the triple store access functions. One can create an entity (e.g., a Threat), represented as a deserialized ThreatDB object, then use a JenaQuererDB instance to store it, but the JenaQuererDB object just adds the ThreatDB object to a cache, and returns the same object when requested in a subsequent read access.

What this means is that updates are applied to the cached entity objects, e.g., to set the likelihood of a threat one uses a method on the ThreatDB object. Since this object is in the cache, all the JenaQuererDB object does is to add the ThreatDB object to the list of new and modified objects in the cache. There is a separate sync() method that serializes new and modified objects in the cache back to the triple store, so really we just need to test that mechanism to check serialization and deserialization functionality.

There is some subtlety regarding the use of different system model graphs. The asserted graph contains user/client asserted assets and relationships, and is in practice updated via the client API which uses the SystemModelUpdater class. The inferred graph contains other entities (including some assets and relationships) added by the validator, plus likelihood and risk levels and causation relationships from the risk calculator. In some situations, an entity may be split between the two graphs:

When an entity or set of entities is read using a JenaQuerierDB object, the argument list includes strings referring to the required graphs. An EntityDB object is returned for each requested entity that has any properties specified in those graph(s). EntityDB member variables corresponding to properties not defined in the requested graph(s) will be null, except the entity URI and type which are always set in the returned object. If one graph is specified, one gets all entities from that graph and some from the other graph. If both graphs are used, the JenaQuerierDB object still returns one object with variables set based on properties from either graph.

Proposed unit tests

Taking these aspects into account, it is proposed that the following tests should be used:

These tests check the methods designed to modify risk calculation inputs work as expected, which means they can be used to update models prior to risk calculation, when (a) values must be altered to ensure population triplets are consistent, or (b) risk calculations are used in 'what if' scenarios, e.g., by the Control Strategy Recommender algorithm.

The last test also checks that JenaQuerierDB.sync() writes data to the triple store as expected.

Then further tests could be added to check that the JenaQuerierDB initialisation methods work as expected. To do this, it is proposed to add an argument to the Validator.validate() method, controlling whether results are serialized to the triple store using the sync() method at the end. This would emulate the approach used in the RiskCalculator.calculateRiskLevels() method. The following tests could then be used:

In each of these tests, the output model should be checked against results obtained by manually running the model. Ideally, this should be done by comparing the whole model, e.g., by using a canonical serialization as JSON files, and using diff between the resulting files. A less stringent but probably still sufficiently sensitive test would be to compare the likelihood levels for specific Misbehaviour Sets after the risk calculation against known values from a manual analysis of the same system model.

These tests check that the JenaQuerierDB initialisation methods work as expected, and that there is no interference between those methods and the cache synchronization method.

mike1813 commented 8 months ago

First step now committed and pushed to GitHub. This just loads a model and checks what is returned when some inferred entities are loaded via a single JenaQuerierDB object, where some of those entities have properties in the asserted graph.

What emerged is that if caching is switched off, the merging of properties (with the asserted graph taking precedence) doesn't work correctly. It looks like you just get a property value from the last graph to be loaded, suggesting that the merge is done 'in cache'.

Need to discuss this with @scp93ch. It may be we should change the spec rather than try to fix this...