Spyderisk / system-modeller

Spyderisk web service and web client
Other
3 stars 4 forks source link

Improve recommendations system #140

Open scp93ch opened 5 months ago

scp93ch commented 5 months ago

There are various aspects of the recommendation algorithm (#67) that can be improved. This issue is to log them.

scp93ch commented 5 months ago

The method to determine the shortest path through the attack graph that is being used in the Java implementation is not as recent as the one used in the Python attack graph tool.

In the Python, the distance is calculated to ignore the secondary threats. I think this gives a more useful measure and should mean that more relevant primary threats would be targeted.

scp93ch commented 5 months ago

Currently the algorithm works depth first when trying CSG options. We might want to work breadth first and discard some of the options once the risk vector is calculated.

However, this might not be a good idea because we would then probably ignore branches where the risk initially increases due to a side-effect that can be fixed further into a branch.

scp93ch commented 4 months ago

Some commentary on how to potentially remove CSGs with side effects can be found in #137

scp93ch commented 4 months ago

The threat graph algorithm will not currently find all viable paths. See #142.

scp93ch commented 4 months ago

If the current risk level was very high and the acceptable risk level was medium, and the recommendation algorithm could only find at best a way to bring the risk level down to high, then we might want to provide that recommendation to the user.

More generally, we might want to provide any recommendation where we have at least reduced the risk level. On the client side, the ones that result in the lowest global risk would be sorted to the top.

scp93ch commented 4 months ago

The method to determine the shortest path through the attack graph that is being used in the Java implementation is not as recent as the one used in the Python attack graph tool.

In the Python, the distance is calculated to ignore the secondary threats. I think this gives a more useful measure and should mean that more relevant primary threats would be targeted.

One thing that has been noted is that as we are only computing CSG logical expressions for the "shortest path" through the threat graph, it can be the case that an option to control the threat graph can be bypassed by a longer threat path and therefore has little or no effect. If we improve the shortest path calculation (as now found in the Python) then this should happen less and so the search will be faster. It's not clear how much of an effect this will have.

scp93ch commented 4 months ago

I believe we are currently only examining CSGs for threats on the attack graph, not the larger threat graph. The effect of this is that we do not include CSGs which block normal operation threats, and specifically do not propose FWblock CSGs that block the "In Service" normal-op threat for interfaces. Adding a FWblock is commonly a useful thing to do (and also hard to find as it is an inferred asset). We should investigate how to permit the recommendations to include such CSGs.

scp93ch commented 4 months ago

With the current method, it may return duplicate recommendations. This occurs when it tries the same CSGs but in a different order. We shouldn't be returning duplicates.

Even better, we shouldn't be finding duplicates in the first place. We should maintain a set of aggregated CS combinations that have been previously explored (aggregating down the recursion that is). If we find we are examining a set of CS that we've done before then we can abort that branch. doing this would mean we don't return duplicates and we speed up the computation.

kenmeacham commented 4 months ago

One other sub-issue is how to deal with long-running recommendations jobs. As we know, it can take ages to run the recommendations and there is a chance it may never complete. We therefore need some kind of mechanism for cancelling a job. @panositi is doing some initial investigations, but this may be complex and will require changes to the service and the UI. Probably this should also be spawned as a separate issue, which might enable the current work to be merged into dev?

scp93ch commented 4 months ago

Recommendations are currently persisted forever in a server-side mongo DB. We need to either not persist them at all (not clear why we are), or make sure they are removed (e.g. when the risk calculation is run).

scp93ch commented 4 months ago

One other sub-issue is how to deal with long-running recommendations jobs. As we know, it can take ages to run the recommendations and there is a chance it may never complete. We therefore need some kind of mechanism for cancelling a job. @panositi is doing some initial investigations, but this may be complex and will require changes to the service and the UI. Probably this should also be spawned as a separate issue, which might enable the current work to be merged into dev?

See comment here: https://github.com/Spyderisk/system-modeller/issues/67#issuecomment-1988944904

kenmeacham commented 3 months ago

One other sub-issue is how to deal with long-running recommendations jobs. As we know, it can take ages to run the recommendations and there is a chance it may never complete. We therefore need some kind of mechanism for cancelling a job. @panositi is doing some initial investigations, but this may be complex and will require changes to the service and the UI. Probably this should also be spawned as a separate issue, which might enable the current work to be merged into dev?

See comment here: #67 (comment)

Long-running jobs have now been addressed via issue #145, so this requirement can be considered closed. We can now timeout on the server, or the user can cancel the job.

kenmeacham commented 3 months ago

One side-effect of the recommendations timeout (#145) is that, when this happens, the user is not informed. Any recommendations generated so far are returned, so the effect for the user is the same as if the recommendations had completed sucessfully in the allowed time.

So probably we need some kind of mechanism to inform the user when the server has timed out.