Ouranosinc / Magpie

AuthN/AuthZ services
https://pavics-magpie.readthedocs.io
Apache License 2.0
1 stars 5 forks source link

network mode #589

Open mishaschwartz opened 1 year ago

mishaschwartz commented 1 year ago

Introduce "Network Mode" which allows other Magpie instances to act as external authentication providers using access tokens.

This allows users registered across multiple Magpie instances in a network to more easily gain access to the resources within the network, without requiring the duplication of user credentials across the network.

In response to the discussion here: https://github.com/DACCS-Climate/DACCS-executive-committee/issues/8.

This change is fully backwards compatible and is only enabled if MAGPIE_NETWORK_MODE is set (not set by default). Otherwise, magpie admins and regular users should see no changes at all.

TODO:

mishaschwartz commented 11 months ago

@fmigneault

I've been thinking about this:

It seems like the custom external login code is essentially doing what authomatic would perform, but using non-standard OAuth methodology and endpoints.

I think that thinking about this from the perspective of OAuth is actually the wrong approach. Implementing a full OAuth provider is actually overkill for our purposes, but I think that we can adopt some of the OAuth ideas to improve the functionality and security of this new feature.

The goal of this feature is simply to provide the user with an API token. This API token is analogous to OAuth's "access token" but:

We can still make the token exchange more secure by passing a client secret when one node passes the token to another.

Please let me know if you have any ideas for a different approach that I'm missing here

fmigneault commented 11 months ago

@mishaschwartz

  • we don't actually want the whole OAuth flow because that would require the user to authenticate with the authorization provider (the other node) every time they make a request to a new node (which is one thing we're trying to avoid here).

I'm not sure if I understand this part. When a UserNodeB wants to access a resource from NodeA using a token generated for (NodeB, UserNodeB) by NodeA, that token would need to be generated on behalf of that user after NodeA-NodeB handshake was accomplished. Therefore, UserNodeB would need to be logged in NodeB in order to perform a "Request NodeA Token". That token request could be the "external provider login" procedure, which authomatic from NodeA would validate client id/secrets against the originating NodeB by calling and counter-validating the /userinfo the UserNodeB claimed to be. The only distinction of this procedure from the current code to support Network Mode would be that an access token would be returned to NodeB instead of the Cookie currently used to complete log in NodeA.

Did you have another procedure in mind? I don't see how else the nodes can validate that the user requesting an access token is legitimate if NodeA cannot re-validate UserNodeB on NodeB with an appropriate login.

  • the token should only be used to access resources protected by twitcher, this is not intended to be used to log in to Magpie to access the Magpie UI for example.

Technically speaking, once Magpie can authenticate the user, it gives the same access via Twitcher. There are no distinction between these two. When logged in an "admin", you obtain access to all Twitcher services and Magpie UI pages. When non-admin, you get restricted Twitcher access based on Magpie permissions and limited access to Magpie UI to /magpie/ui/users/current. Therefore, if a token would authorize a user to access a resource via Twitcher, that same token could be used to log in Magpie.

mishaschwartz commented 11 months ago

Did you have another procedure in mind?

Yeah...

Just to clarify a few things:

When a UserNodeB wants to access a resource from NodeA using a token generated for (NodeB, UserNodeB) by NodeA

The token is generated by NodeB for UserNodeB, not by NodeA

UserNodeB would need to be logged in NodeB in order to perform a "Request NodeA Token"

The token is not for NodeA, the token is generated and validated exclusively by NodeB. The same token can be included in a request to any node (NodeA, NodeB, NodeC, NodeD) and it will still work.

fmigneault commented 11 months ago

When a UserNodeB wants to access a resource from NodeA using a token generated for (NodeB, UserNodeB) by NodeA

The token is generated by NodeB for UserNodeB, not by NodeA

This is the problem. The resources are protected by NodeA, so it is NodeA that must generate the token. It cannot be the other way around, otherwise the token cannot be secured and signed by NodeA's secret, which NodeB cannot ever know. If NodeB knew that secret, that would make the full network at risk because it would assume every node knows every other node's secrets bidirectionally.

You are missing a key step here:

The token is not for NodeA

Seems to contradict the first point: "UserNodeB would like to access a resource on NodeA".

No matter how the token/cookie is generated, when Twitcher validates with Magpie for AuthN/AuthZ of said token, it will be for accessing resources on its own instance. Twitcher/Magpie of NodeA do not care about how NodeB resources are accessed and by whom. It is the problem of this instance's Twitcher/Magpie.

The same token can be included in a request to any node (NodeA, NodeB, NodeC, NodeD) and it will still work.

That is just not possible, by concept, for security. Each node should have respective and distinct secret signing of their own tokens. If not, that bypasses the purpose of encrypting the tokens. It's just like passing around your private SSH key to anyone that want to access the protected instance, and then "trust" them not to share it around for other (malicious) purpose.

mishaschwartz commented 11 months ago

Ok, just to be clear, I understand what you're saying. You're describing a solution using Oauth, I'm describing one that doesn't use Oauth but relies on limited resource access and short token lifespans to mitigate unauthorized access.

The main advantage of my solution is that it requires fewer steps for users and node administrators to set up a system that allows them to access resources on multiple nodes.

BUT ... after reviewing your proposed solution again, I'm happy to go with your proposal and we can provide the user with client-side tools to simplify their workflow instead. I don't know what that will look like yet but I'm sure we can figure something out.

I think the most important thing is to get this working in the short time that CRIM is still available for the project.

fmigneault commented 11 months ago

The main advantage of my solution is that it requires fewer steps for users and node administrators to set up a system that allows them to access resources on multiple nodes.

So it does for someone trying to bypass security! Hence why I see it on the flip side as a disadvantage.

The main issue I can see with the proposed solution was that it uses a less secure approach by design. If it was a security method on its own, it wouldn't be as problematic, but since it inevitably ties back into the Twitcher authorization at some point, that new method would create the weakest link security-wise affecting the overall solution.

I'm not too strict about using Oauth or an alternative method, as long as the core principles around it align.

provide the user with client-side tools to simplify their workflow instead

I believe this could be made relatively transparent for the user. It is rather the Magpie nodes themselves that would need to perform a few more request validation steps to sync tokens between each other behind the scene. Once Cookie or Authorization headers are set, the user shouldn't see the difference where the resources are being accessed from.

mishaschwartz commented 11 months ago

@fmigneault

I have two new proposals for the access flow.

Both options are similar to OAuth's client credentials flow with the added protection that both Nodes validate the requests/responses of each other using asymmetric public/private keys (can be implemented by passing RSA256 signed JWTs)

Option 1: UserA acquires a token to access a specific resource on a remote NodeB Option 2: UserA acquires a token to access the remote NodeB as a specific user on NodeB (anonymous or otherwise)

For both proposals, we assume the following has already been set up:

Access flow for Option 1:

Access flow for Option 2:

Let me know if you have any questions or an opinion about which method you'd prefer.

Some security implications to consider:

fmigneault commented 11 months ago

Option 1:

NodeB check's if userA is authorized to access the resource in question

How does NoteB perfom that check? Since there are no "UserB" with that option, there is no permission defined for /thredds/datasets/example.ncml. There is also no possible resolution of group permissions (directly on the resource or by resource hierarchy resolution logic), since no "UserB" can be defined as a member of a given group. The "effective permission" is resolved using users and their group memberships currently, but not groups on their own.

Option 2:

NodeB matches UserA with a user that already exists on NodeB (can be one a specific user or the "NodeA anonymous user")

That could work, but we need to figure out how we will define the user to avoid potential name conflicts.

One thinng to consider is that, if UserA is not admin on NodeB (shouldn't be, otherwise they wouldn't need the token), they cannot list users on NodeB. Therefore, they have limited capabilities to "associate" their UserA to some "NodeBUserA". Trusted NodeB by NodeA could do this association on behalf of UserA, but again, we must consider how to handle name conflicts.


In both options, the rest of the procedure seem adequate.

I tend to lean toward Option 2 more, because that involves less (no) modifications in the "effective permission" logic. There's also less API/UI to develop to support "admin revoking a resource token", since it can use the current serivce/permission interface. After resolving the JWT token into some auth/cookie, Magpie/Twitcher would see the resolved "NodeBUserA" as any other local NodeB user (and applicable groups if need be).

In Option 2, a user only needs one token to access multiple resources

This is also a compelling argument for that option. It allows the user to directly interact with all its resources without having to manage individual tokens per endpoint, which would be very complicated if they try to do so themselves (eg: when accessing data in a notebook).

Extra Considerations

Keys of remote nodes would have to be stored in DB using an extra hashing pass using the secret of the local instance. This way tokens will not be stored in "plain text".

mishaschwartz commented 11 months ago

How does NoteB perfom that check?

In the way that we discussed above and is currently implemented in this PR:

NOTE: If we're going with Option 2 anyway (see below) this doesn't matter anymore

That could work, but we need to figure out how we will define the user to avoid potential name conflicts.

As discussed here (https://github.com/Ouranosinc/Magpie/pull/589#discussion_r1328740823): remote users will be stored in a separate table and have a foreign key reference to the users table to avoid name conflicts.

Therefore, they have limited capabilities to "associate" their UserA to some "NodeBUserA"

Only an admin on NodeB should be creating those associations at the moment. It could be something that an administrator would have to set up when creating a new account.

We could allow users to create their own associations between nodes using a version of Oauth's implicit flow:

That might be a different PR though.

I tend to lean toward Option 2 more, because that involves less (no) modifications in the "effective permission" logic

Ok that's a good enough reason to choose this option

Keys of remote nodes would have to be stored in DB using an extra hashing pass using the secret of the local instance. This way tokens will not be stored in "plain text".

Great idea lets do that

fmigneault commented 11 months ago

Ok. Sounds good for admin creating the associations for now, and having a separate user table. The user self-association is also a good idea, but indeed better to keep it for a future PR.

Now, considering that separate user table, how would the admin make that association? There is a need for API endpoint(s) (at least), and potentially a new UI page, to allow them to add/list/remove remote users and the associations between local/remote users.

mishaschwartz commented 11 months ago

There is a need for API endpoint(s) (at least), and potentially a new UI page, to allow them to add/list/remove remote users and the associations between local/remote users.

Yes for sure. The plan is to start with the API endpoints and then add the UI additions as a second step.

The plan for the UI is to add a new field to the form on /ui/users/{username}/default to add/remove a remote username and node name (username is a text field, node name is a drop-down list).

Then we can add a new page to list all users with associated remote user names (admin view only). We could also display this information on /ui/users if you'd prefer to not create a new page but that might get messy since a single user can have multiple associations to multiple remote users on other nodes in the network.

codecov[bot] commented 10 months ago

Codecov Report

Attention: Patch coverage is 86.46685% with 149 lines in your changes missing coverage. Please review.

Project coverage is 81.67%. Comparing base (5ee83c5) to head (ae9caf6). Report is 8 commits behind head on master.

:exclamation: Current head ae9caf6 differs from pull request most recent head 64894e3

Please upload reports for the commit 64894e3 to get more accurate results.

Files Patch % Lines
magpie/api/management/network/network_utils.py 51.04% 44 Missing and 3 partials :warning:
magpie/cli/create_private_key.py 51.35% 18 Missing :warning:
magpie/cli/purge_expired_network_tokens.py 72.72% 11 Missing and 4 partials :warning:
magpie/utils.py 11.76% 14 Missing and 1 partial :warning:
magpie/api/login/login.py 66.66% 9 Missing and 4 partials :warning:
.../api/management/network/node/network_node_views.py 92.24% 7 Missing and 2 partials :warning:
magpie/ui/utils.py 82.14% 4 Missing and 1 partial :warning:
magpie/adapter/magpieowssecurity.py 57.14% 2 Missing and 1 partial :warning:
magpie/api/management/group/group_views.py 25.00% 2 Missing and 1 partial :warning:
...anagement/network/remote_user/remote_user_views.py 96.47% 1 Missing and 2 partials :warning:
... and 9 more
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #589 +/- ## ========================================== + Coverage 80.88% 81.67% +0.78% ========================================== Files 73 86 +13 Lines 10191 11246 +1055 Branches 1825 2023 +198 ========================================== + Hits 8243 9185 +942 - Misses 1625 1710 +85 - Partials 323 351 +28 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mishaschwartz commented 10 months ago

@fmigneault

All api and ui changes have now been made and all existing test and code check are now passing.

I will start writing tests for the new functionality but the code and doc changes are ready for another look.