alan-turing-institute / data-safe-haven

https://data-safe-haven.readthedocs.io
BSD 3-Clause "New" or "Revised" License
62 stars 15 forks source link

Investigate auditing, monitoring and protecting solutions for Safe Haven #790

Closed kevinxufs closed 3 years ago

kevinxufs commented 4 years ago

:scroll: Description

This issue will cover all issues related to auditing, monitoring and protecting. Currently in the NHS Cloud security principles there are nine rows that cover this sort of issue:

:strawberry: Desired behaviour

We should be looking into the following:

Some of the above issues are non technical policy changes (e.g. 2,7,8) to do with our own due diligence. Should probably expect these to require deciding on a policy and then writing it up to evidence.

The others will require an exploration of the above security and monitoring options and should engage with Ian in this discussion.

kevinxufs commented 4 years ago

@jemrobinson @JimMadge @martintoreilly (for reference)

Following our discussions today and last week, here's my assessment of this issue:

We ooriginally started with around 9 or so points based on the NHS cloud security principals, which was then converted into a number of issues. After going through all of this we can now see that there is a lot of work to be done.

Part of this work is implementation, for example we may just need to implement something like Azure Sentinel or an equivalent solution. Other parts of our work are more difficult, for example questions like how we are to use audit / monitoring data to inform our security decisions. For example in the case of logs for user access, it is not clear how we use the information - successful and unsuccessful user access may be both good / bad from our perspective.

Given the large ammount of work involved here (and the long expected timelines for achieving accreditation) we discussed the possibility of first doing larger infrastructure changes. Investing in these infrastructure changes would allow us to more easily do further development, including logging / monitoring changes. In particular, without changing our current infrastructure to something like Ansible, it would be very difficult to have any kind of automated inventory management system (one of the NHS requirements).

Here is an overview of our planned approaches for our various monitoring / logging issues, and how this may be affected by our architectural changes (ARCH):

GPG13 (see #781 for full list of issues)

Implement inventory management.

Without some kind of infrastructure as code, this will be extremely difficult as we would have to develop our own kind of inventory management system.

ARCH: This would be trivial to do if we switch our underlying architecture to use something like Terraform or Ansible

Patching and vulnerability management.

This can be fixed for our Windows VMs by simply enabling windows updates. We currently have a group policy that seems to be blocking this, but that can be changed.

This will be more difficult with our Linux VMs. We could enable automatic updates for Ubuntu 18.04, but there is a bit of nuance here, as it is difficult to seperate security updates from regular updates.

ARCH: Switching architecture to something like Ansible / Terraform has a minor benefit in that will be much easier to implement manual interventions to Linux machines.

Incident management process

We should check this with IT to see what they are doing

Use audit data as part of protective monitoring

In general we should be using some kind of logging system to generate our audit data. We can leave it open at this point as to what this logging system may be. This approach is primarily targetted at user sessions.

The main challenge is determining what to do with our logging data. In particular, it is not clear what kind of logs are 'good' and 'bad'. For example, it isn't clear whether a successful login is a good log. Normally it would be - but if someone managed to hack in to our systems then a successful login would be presumably be bad for us. We want to be informed when something bad happens, but it's describe rules for what is bad.

Our initial thoughts for some suspicious things are:

Next Steps

I think there are two key things.

Change infrastructure

First we need to establish whether we are going to invest development time now in making these architectural changes. Doing so would provide us with great benefits in the long term, and is pretty much a necessity to do effectively inventory management. On the other hand, it would take a while to get it running, and in the mean time we would not really be responding to any of these issues.

My suggestion (given that we expect security accreditation to take a while, and that we are not necessarily the critical path) is that this is worth doing.

Prioritise issues

We now have a huge number of issues tagged as monitoring and / or nhs cloud security. Once we've committed to either changing infrastructure or not, we should then establish the dependancies between the different issues we have (e.g. we should probably establish how to generate logs before we think about a log life cycle policy), and then prioritise them accordingly. Many of the issues are related.

In this prioritisation process we should think a bit about at what stage do we need to decide on an exact solution / implementation.

kevinxufs commented 4 years ago

monitoring flow

See diagram for dependencies.

Note #808 is independant and is sorting out machine times.

819 is the first thing we would need to do as otherwise there would not be anywhere to send the logs to.

rwinstanley1 commented 3 years ago

@JimMadge I know you are looking into the auditing and monitoring solutions for the DSH. Is this issue something that you would like to be open?

assigned to me as part of DSPT - I think the work you are currently doing covers us under DSPT but its whether any of these are things we want to consider for wider functionality.

JimMadge commented 3 years ago

@rwinstanley1 Do you think DSPT supersedes this? If so I'm happy to close this and the related issues.

I think the DSPT issues on monitoring/logging have good detail of our plans going forward.

rwinstanley1 commented 3 years ago

@JimMadge yes my instinct is that the DSPT monitoring logging works supersedes this and further work would be dependent on the outcomes from that.

I'll close this and related issues if you're happy with that!