Spyderisk / system-modeller

Spyderisk web service and web client
Other
3 stars 4 forks source link

Unit test testCurrentOrFutureRiskCalculation #75

Open mike1813 opened 1 year ago

mike1813 commented 1 year ago

Unit test RiskLevelCalculatorTester.testCurrentOrFutureRiskCalculation was disabled some time ago when it was found that the test failed, but the same sequence run manually did not. It was assumed that the test framework introduced some dependency between the two risk calculation steps (one current risk, one future risk), but the cause could not be found.

Subsquently it seemed more likely that the problem was caused by the use of 'artificial' controls added to the domain model to regulate some threats that should not be included in a current risk calculation. The test framework did not know that these should be enabled or disabled before re-running the risk calculation. This may have been the cause of the original problem. See https://github.com/Spyderisk/domain-network/issues/28.

Meanwhile, some risk calculation steps have been moved to the Jena querier class so they can be used in the validator to initialise control sets and trustworthiness attribute sets when first generated. To support this, some risk calculation initialisation has also been moved into the querier, which may cause problems if a risk calculation is re-run in the same thread using the same querier. This could not be a cause of the original unit test failure, but may mean the unit test will still fail if used with an up-to-date domain model in which artificial control strategies have been removed.

This issue has become more important now we are seeking to integrate a control strategy recommendation algorithm into the system-modeller code. This will be using a risk calculator inside an iterative search procedure, leading to repeated risk calculation runs in a single thread. We should now find out whether the unit test works if fed an up-to-date test case with no artificial controls, and if not, determine whether the cause may affect the envisaged control strategy recommendation algorithm.

mike1813 commented 1 year ago

Original problem investigated.

It seems that the test framework can't handle more than one domain model with the same KB graph URI. I don't understand how, but when we updated the test to use the latest domain model, the Risk Calculator thought the TWA 'domain#UserTW' was linked with two different Misbehaviours (which is not allowed):

The only way the second of these could happen is if the triple store contains elements of an older domain model. The test class did register 8 or 9 domain models, several of which use the same domain graph URI. It looks like when these are registered, they are all loaded into the triple store, so when the Risk Calculator starts up, using a JenaQuerierDB connected to the triple store, what it gets from the JenaQuerierDB is a merger of several domain models.

The reason this doesn't happen in the system-modeller service is because when a new domain model is uploaded using the same graph URI as an existing one, the previous domain model is deleted first. The problem only seems to affect the test framework.

mike1813 commented 1 year ago

@scp93ch : we should discuss this and decide what to do.

I believe the problem is caused because when domain (and possibly system) models are registered by a test class, they are loaded into the triple store. When each unit test method is executed, it selects the relevant domain and system models, which creates the stack of models used in the test. The problem is that if you load two domain models to the same Graph URI, you end up with the union of the two models. This will produce incorrect results, and can lead to inconsistencies that may cause a crash (as in this case).

I believe we should do the following:

If this test now works, and any previously working test still works (possibly after fixing any faults uncovered by this change), then it means my suspicion is correct. It also means we can work around the issue by ensuring that we never use the same domainGraph URI for two domain models in the same unit test class.

That being the case, we could close this issue. The only problem with that is we may experience problems later if people forget that they mustn't add further domain models with the same graph URI.

To completely solve the problem, we would need to alter the way test inputs are handled, such that they are only loaded into the triple store when selected for use by the test class (rather than when they are registered), and dropped afterwards. This may make the unit test sequence even slower, but would mean domain and system models can be added without worrying about it.

mike1813 commented 1 year ago

Pushed a change to branch 75, in which the other domain models are commented in RiskLevelCalculatorTester, showing that the testCurrentOrFutureRiskCalculation test then works as expected.

Other test methods that use the commented domain models will now fail, of course. You can only run that one test.

Not a solution, but demonstrates that the management of domain models is the cause of the problem.

scp93ch commented 1 year ago

To clarify slightly, we have many domain models loaded, and this may include multiple versions of domain models which share the same "DomainGraph" URI. Each system model that is loaded refers to the DomainGraph URI of the domain model it depends on.

The file for each domain model loaded needs to be adjusted so that it has a unique DomainGraph URI. The DomainGraph URI is, for example, http://it-innovation.soton.ac.uk/ontologies/trustworthiness/domain-network, and appears on every line of a domain model NQ file as the 4th element of each quad. It might make sense to just append the domain model version string to the DomainGraph URI. The version string is found in the quad:

<http://it-innovation.soton.ac.uk/ontologies/trustworthiness/domain> <http://www.w3.org/2002/07/owl#versionInfo> "6a3-2-1" <http://it-innovation.soton.ac.uk/ontologies/trust
worthiness/domain-network> .

Each system model then also needs adjusting to refer to the new URI for its domain model. A system model only uses the DomainGraph URI once, e.g.:

<http://it-innovation.soton.ac.uk/system/5d791add884bd304cdff2856> <http://it-innovation.soton.ac.uk/ontologies/trustworthiness/core#domainGraph> <http://it-innovation.soton.ac.uk/ontologies/trustworthiness/domain-network> <http://it-innovation.soton.ac.uk/system/5d791add884bd304cdff2856> .
scp93ch commented 1 year ago

It may also make sense (for clarity) to create a sub-folder containing each domain model and the system models that depend on it.

panositi commented 1 year ago

testCurrentOrFutureRiskCalculation is using a unique set of domain and system graphs (domain-fogprotect). The original test fails after the 2nd risk calculation (FUTURE) because the targeted misbehaviour set value is expected to produce low likelihood, while the returned likelihood value (medium) does not change between the two risk calculations.

In order to fix the test we either have to change the targeted misbehaviour expected likelihood value, or find another misbehaviour set that its likelihood changes between CURRENT and FUTURE risk calculations, or use a different set of domain and system graphs with known misbehavour likelihoods, e.g. data-flow model.

mike1813 commented 1 year ago

testCurrentOrFutureRiskCalculation is using a unique set of domain and system graphs (domain-fogprotect). The original test fails after the 2nd risk calculation (FUTURE) because the targeted misbehaviour set value is expected to produce low likelihood, while the returned likelihood value (medium) does not change between the two risk calculations.

In order to fix the test we either have to change the targeted misbehaviour expected likelihood value, or find another misbehaviour set that its likelihood changes between CURRENT and FUTURE risk calculations, or use a different set of domain and system graphs with known misbehavour likelihoods, e.g. data-flow model.

The failure of the test assertion is not the underlying cause, so just changing the expected values will not fix the problem. It might mean the test reports success, but the test would still be faulty.

The true cause of the problem here is a combination of two things:

  1. The domain model is old enough that 'artificial' CurrentRiskCalculation controls must be selected/deselected depending on the type of risk calculation, and the test method doesn't do this. The best way to fix this is by moving to an up-to-date test case.
  2. Domain models with the same graph URI are merged together when registered in the test class initialisation. That would almost certainly make the Risk Calculator produce incorrect output and may cause it to crash. The fastest way to fix that is to make sure no two domain models used in the tests have the same graph URI.

In branch https://github.com/Spyderisk/system-modeller/tree/75-unit-test-testcurrentorfutureriskcalculation, I modified the test class so this method uses an up-to-date test case. This addresses the first problem, but it added another domain model with the same graph URI as one of the older tests, and the resulting merged domain model was so bad that the Risk Calculator crashed. I fixed this by commenting out the other domain models, but this makes all the other tests crash instead.

You should copy in my version of the test method, and the test files, but not the rest of the changes. Then you should edit the test files and change the graph URI in some domain models so no two domain models have the same graph URI. Then you should edit system models that use domain models where you changed the graph URI, and alter the domainGraph reference so they still refer to the correct domain model.

panositi commented 1 year ago

I will change the domain and system graphs for that test then as you suggested. I will also try to restructure the resources folders so they reflect the structure of unit tests, e.g. resources for src/test/java/uk/ac/soton/itinnovation/security/modelvalidator/test/RiskLevelCalculatorTester.java should be located at src//test/resources/modelvalidator/RiskLevelCalculatorTester, etc.

kenmeacham commented 1 year ago

I would only use subdirectories for test files when you really need to, i.e. a lot of tests may legitimately share the same domain/system model. The restructuring will help with organising these test files where clashes occur, however the file location itself won't change the outcome - you'll need to ensure the unique URIs, as Mike suggested.

scp93ch commented 1 year ago

I agree with Ken about the sub-folders. As I said above, if we are to use sub-folders it should be to group each domain model with the 1 or more system models that use it. This arrangement is independent of the tests.

panositi commented 1 year ago

testCurrentOrFutureRiskCalculation is now fixed using DM v6a3.2.2 and SM and updated version of the data-flow model. The two models follow the data folder structure I suggested ie src/test/resources/modelvalidator/RiskLevelCalculator. The rest of the models in resources are intact, so other tests will not need to be changed.

The RiskCalculationTester class runs successfully:

--------------------------------------------------------------------
|  Result: SUCCESS (10 tests, 9 successes, 0 failures, 1 skipped)  |
--------------------------------------------------------------------

The skipped test is the testThreatFrequency and was like that.

mike1813 commented 1 year ago

Two questions:

It is possible the skipped testThreatFrequency test failed for similar reasons as testCurrentOrFutureRiskCalculation, and hence more URI changes may be needed in the RiskLevelCalculator models.

It is also possible that some of the other unit tests may have similar issues, which means the results may be unreliable. We should therefore make sure all domain models have unique graph URIs in all the test classes before we close this issue.

panositi commented 1 year ago

Both testThreatFrequency and testCurrentOrFutureRiskCalculation use DM: domain-network-v6a3-2-2-unfiltered.nq.gz and SM: system-dataflow-test-singles-validated.

Other tests in RiskLevelCalculator still use the initial DMs and SMs. All DMs and SMs are checked and modified accordingly to use unique URIs.

---------------------------------------------------------------------
|  Result: SUCCESS (10 tests, 10 successes, 0 failures, 0 skipped)  |
---------------------------------------------------------------------