Spyderisk / system-modeller

Spyderisk web service and web client
Other
4 stars 4 forks source link

Deficiency in cache implementation #78

Open mike1813 opened 1 year ago

mike1813 commented 1 year ago

Because the performance of read/write operations on the generic triple store is quite slow, the JenaQuerierDB class used to access the triple store from the Validator and Risk Calculator uses a cache (implemented by class EntityCache). If enabled, entities are stored in the cache once loaded from the triple store. Updates are made in the cache, which is only written back to the triple store when the cache is synchronised.

However, the introduction of population support made it necessary for the validator to modify the asserted graph where inputs from clients are inconsistent across an asset population. This is handed by a small number of JenaQuerierDB methods, used at the start of validation.

Moreover, the Control Strategy recommendation algorithm now has a requirement for an iterative loop in which control selection or coverage levels are altered and the risk calculation repeated afterwards. This should really be done using a single Risk Calculator object, since nothing else changes between iterations and reloading everything from the triple store would take significant time on each iteration.

This presents a problem because the Risk Calculator currently checks the asserted graph to find user inputs regarding controls. It would be possible to modify the asserted graph using the same JenaQuerierDB methods, but they were not designed to maintain coherency in the cache. The Risk Calculator would use the cached versions and fail to pick up the changes.

One possible solution would be to fix the asserted graph update methods so they either invalidate the cache (forcing reload from the triple store) or apply their updates also to the cache (if enabled).

Arguably, the CSG recommender algorithm should not be changing the asserted graph anyway - although it could lock clients out from write access to the system model while the calculation is in progress, and restore the original settings afterwards. To make that work, the Risk Calculator itself would need to change so inputs from the asserted graph are handled by a separate initialisation step and working data is only saved in the inferred graph.

mike1813 commented 1 year ago

@scp93ch and @panositi : we should discuss the options here.

scp93ch commented 1 year ago

We agreed that the recommender algorithm will need to lock access to the system model regardless of the approach used in the risk calculator. At a minimum that would be via a blocking call.

The recommender algorithm tries adding and removing ControlSets (forming ControlStrategies) as it traverses down and up a recursive search tree. It is therefore inherently able to keep track of what has been added and removes what it adds so that at the end it produces a recommendation but no change to the graph. In this sense, it doesn't matter whether the recommender algorithm is adding and removing ControlSets from the asserted or inferred graphs.

Philosophically it is better for it not to adding ControlSets to the asserted graph, as they really are not asserted. They are not really inferred either, so there is an argument that we might want yet another graph, but putting them into the inferred graph is a happy medium.

Having the recommender algorithm add ControlSets to the inferred graph will give us a big performance boost as well as the JenaQuerier will not need to be reexecuted for every ControlSet change and risk calculation.

We therefore want to move the interactions between the risk calculator and the asserted graph into an initialisation step and only change the inferred graph while recursing through the recommendations.

mike1813 commented 9 months ago

@scp93ch : I think there is some misunderstanding here. The methods in JenaQuerierDB that update the asserted graph are only used by the validator. The risk calculator does not write to the asserted graph - it only reads from it, specifically to get the status of controls, plus coverage levels and other assumed TW levels as asserted by the client/user. The risk calculator assumes the asserted graph is definitive (since it may have been updated since the last validation/risk calculation), and only ignores asserted graph values where they are inconsistent across population triplets.

That's the reason why we have a problem. If the asserted graph contains a control selection status property, the risk calculator uses it. Changing control selections in the inferred graph then has no effect, so to ensure that recommender algorithm changes are used it must make them in the asserted graph.

@panositi, @kenmeacham and I discussed this today. We concluded the recommender algorithm should use those JenaQuerierDB methods added for the validator. However, I should first modify the methods so they don't destroy cache consistency. This is not an issue for the validator (because it creates ControlSets but never reads them back again), but will be in the risk calculator and the recommender algorithm if not fixed.

panositi commented 9 months ago

The recommendation algorithm is currently trying to propose certain sets of control sets and test how the risk calculation is affected, then it revokes control set changes and hence the status of the model is reverted back to its initial state. Iterations of the algorithm can be recursive but iteration also keeps track of the changed control sets and reverts them back in the right order.

There is no need to have permanent model changes during the recommendation operation, even if the operation fails should catch that error and revert CS changes.

mike1813 commented 9 months ago

@panositi : I just added the cache update to those methods in the JenaQuerierDB class. I also added methods that take a String URI argument as the specification of which CS (or in one case TWAS) should be updated. I didn't bother with the LevelDB object argument because the natural starting point for that is a level value, so you need to do some conversion first anyway.

The methods now update just one triple in the specified graph, but now it does this inside a try-catch-finally block to ensure that the end is only reached if the update is successful. Then at the end it tries to load the corresponding object with the specified URI from the specified graph. That should always succeed because we just added a triple for it to that graph, but just in case I checked that it is non-null. However, if the object was already in the cache, it won't have been updated, so the method applies the same change to that object and saves it, ensuring that the changed object will be stored in the cache.

As long as you don't sync on the asserted graph, you should end up with only one triple there for the control set proposed status.

A possible refinement would be to delete this triple rather than change it if you specify that the control is not selected. I need to go now, so I didn't have time to do that.

I also didn't have time to create unit tests or anything so nice. The only test I ran was a simple validation + risk calculation (one of my domain model regression test cases). The results were OK, so neither my latest domain model changes nor this system modeller change has broken the basic functionality of system modeller. However, the validator writes control sets but does not read them, so that isn't really testing this change in system modeller. I think only the CSG recommender algorithm does that, so I am leaving you to do the tests and let me know if you have any problems.

I committed the change on branch 67 and pushed to GitHub, so you should be able to merge into your version and try using it.

mike1813 commented 9 months ago

Looks like my changes to those asserted graph update methods are now working, although it is difficult to be sure without a more controlled test case.

There is also a problem with the current implementation. It alters a property in the triple store, then just duplicates the change in the cache, if caching is enabled. What it should do if caching is enabled is to update only the cache, leaving the 'sync' method to handle serialization to the triple store (the calling method should decide if that is required, and if so, invoke 'sync' separately).

The problem with that is that the 'sync' method doesn't work correctly in all situations, because it was designed to work only for the inferred graph, serving the validator and risk calculator at a time when neither could have any reason to touch other graphs. Essentially, the cache is designed to be 'read only' for the asserted graph, and if one calls 'sync' for the asserted graph, it creates extra triples that are duplicates from the inferred graph for any inferred entity. These are then assumed to be sacrosanct (input from clients) when really they are not, so they may be preserved and inappropriately respected later on.

The best fix I can see is to make the 'sync' method sensitive to which graph it is writing back to the triple store, and make it leave out anything that shouldn't be in the triple store. We already use a similar trick for asset relationships, which are stored both as a CardinalityConstraint and as a property of the source asset. Where the relationship is asserted but the source asset is inferred, this leads to a similar issue whereby we need to serialize a triple expressing a property without the rest of the resource.

Doing it this way would mean we need to maintain the 'sync' method whenever we change what should be in each graph. Such a change is rare - typically less than once per year - so not a big overhead in effort, but it would be necessary to keep track...

@scp93ch : what do you think?

mike1813 commented 9 months ago

Discussion between @scp93ch and @mike1813 concluded that however it is done, these asserted graph update methods should work in the same way as other methods. This should be done in a branch off branch 67. A unit test should also be created to test these methods.

mike1813 commented 9 months ago

Plan of action:

The changes to JenaQuerierDB.sync() will be based on the following assumptions.

The asserted graph ("system") should contain

The u/i graph ("system-ui") only contains display information used in the web U/I, and is maintained by the system modeller client. The JenaQuerierDB class can query and cache information from this graph, but it is not currently used for this purpose.

The inferred graph ("system-inf") should contain

The MS, TWAS and CS entities are 'special' because they are created with default impact level, assumed TW level and control coverage level properties in the inferred graph. These are overridden by properties added to the asserted graph by the system modeller client during initialisation of the risk calculator.

For MS, the default impact levels are then restored in the inferred graph, because they were used by the web U/I when revoking a previously asserted override value. This ensures that if the domain model default levels change, the new defaults will be applied at the next risk calculation, and if an asserted override is revoked, the U/I can display the default value from the last risk calculation.

We have not taken the same approach with default assumed TW levels in TWAS entities, nor default coverage levels in CS entities. This is partly due to the fact that assumed TW and coverage levels are related across entities in the same population triplet, which means they are in some circumstances default values in the inferred graph are overridden by the risk calculator to ensure they are consistent over the population triplet.

It was felt that in those circumstances, the client should display modified values, not unmodified default values, although this does mean the U/I has no the default values and so cannot update the display should an asserted override value be revoked. Evidently, there is a trade off here, so it is possible this aspect will be modified in future.

For JenaQuerierDB.sync(), the only issue is how to store these properties in the asserted and inferred system model graphs. As long as the Risk Calculator determines what values they should have, no special treatment should be needed in the sync() method.