alan-turing-institute / data-safe-haven

https://data-safe-haven.readthedocs.io
BSD 3-Clause "New" or "Revised" License
57 stars 14 forks source link

Potentially reoccurring problem with users unable to reset their passwords online #1467

Closed jemrobinson closed 1 year ago

jemrobinson commented 1 year ago

:white_check_mark: Checklist

:computer: System information

:no_entry_sign: Describe the problem

On a production SHM deployment, users were unable to reset their passwords using the self-service password reset flow at aka.ms/sspr.

image

This was caused by the SHM firewall rule called AllowExternalAzureADConnectPasswordReset blocking, the servicebus FQDNs; where requests being made via the password reset (which people have to use to create an account in the SHM) had FQDNs that fell outside the maintained allowlist in deployment/safe_haven_management_environment/network_rules/shm-firewall-rules.json.

We think that the change in the FQDNs used by the password reset was caused by Microsoft changing this without warning.

This problem was fixed in release-v4.0.4 by https://github.com/alan-turing-institute/data-safe-haven/pull/1466 - but it's possible that we'll need to update the firewall allowlist again in future. Is there a way to have a more robust solution?

TODO:

:steam_locomotive: Workarounds or solutions

One proposed solution is to use wildcards in the Firewall allowlist, however care will need to be taken to ensure doing this doesn't constitute a security breach.

e.g. change the allowlist to:

*passwordreset.microsoftonline.com,*.servicebus.windows.net
edwardchalstrey1 commented 1 year ago

@jemrobinson In order to close this issue, what needs to be done? I think either of these:

  1. If we conclude there is no security risk after all, then PR with the allowlist with the proposed wildcard changes above
  2. Have some documentation that explains the need to periodically (and also when the error occurs) update the list via doing something similar to https://github.com/alan-turing-institute/data-safe-haven/pull/1466
    • We might want to include your script in the repo that generated the list of FQDNs
jemrobinson commented 1 year ago

I'd like to see whether *-sb.servicebus.windows.net works. If so, this would restrict the rule to only work for Microsoft-managed service bus endpoints which I think would be acceptable. If not, I agree that we should document the need to update these endpoints (or possibly just make this a GitHub action that updates the rule if necessary).

craddm commented 1 year ago

Can confirm that Azure Firewall does recognise *-sb.servicebus.windows.net as a valid rule. I tested it on Blue - removing the big long list of *.servicebus.windows.net URLs made the self-service password reset fail as described above, and adding back *-sb.servicebus.windows.net made it work again.

JimMadge commented 1 year ago

@jemrobinson is the glob (what is the name for a glob-like object for domains?) *-sb.servicebus.windows.net protected or reserved by Microsoft?

jemrobinson commented 1 year ago

@jemrobinson is the glob (what is the name for a glob-like object for domains?) *-sb.servicebus.windows.net protected or reserved by Microsoft?

I think it's reserved by Microsoft. This is based on my reading of the docs here:

The namespace name should adhere to the following naming conventions:

- the name must be unique across Azure. The system immediately checks to see if the name is available.
- the name length is at least 6 and at most 50 characters.
- the name can contain only letters, numbers, hyphens “-“.
- the name must start with a letter and end with a letter or number.
- the name doesn't end with “-sb“ or “-mgmt“.

which I think mean that users can't create anything matching *-sb.servicebus.windows.net

JimMadge commented 1 year ago

That's great. In that case, I think changing to the wildcard is a very sensible. Simplify the codebase and maintain the same security. (I think there is a strong argument that we could 'downgrade' the security here as users have to access to the machines this applies to).

edwardchalstrey1 commented 1 year ago

Sounds like we're good here - @craddm do you already have a branch to PR or shall I make one?

@jemrobinson since I'm yet to do the full release test in #21 - should we merge this change into the release branch as well as develop? (since this is an SHM update, might make sense to do all in one go)

craddm commented 1 year ago

I just tested it by making the changes manually through the portal, so no PR - happy to do one, but go ahead if you'd rather

jemrobinson commented 1 year ago

I suggest the following:

JimMadge commented 1 year ago

@jemrobinson @edwardchalstrey1 A modification to that second bullet point,

merge into release then tag according to https://github.com/alan-turing-institute/data-safe-haven/blob/develop/CONTRIBUTING.md#contributing-through-github