conda-incubator / conda-store

Data science environments, for collaboration. ✨
https://conda.store
BSD 3-Clause "New" or "Revised" License
144 stars 49 forks source link

Improvements to the namespace Role Mappings #491

Closed costrouc closed 11 months ago

costrouc commented 1 year ago

Motivation

For authentication currently the user to role and permissions has been managed in traitlets. Via a few important attributes

c.RBACAuthorizationBackend.role_mappings: Dict[str, set[str]] = {
            "viewer": {
                schema.Permissions.ENVIRONMENT_READ,
                schema.Permissions.NAMESPACE_READ,
             }
 }       
c.RBACAuthorizationBackend.unauthenticated_role_bindings: Dict[str, set[str]] =  {
        "default/*": {"viewer"},
 }
c.RBACAuthorizationBackend.authenticated_role_bindings: Dict[str, set[str]] = {
            "default/*": {"viewer"},
            "filesystem/*": {"viewer"},
}

class MyAuthentication(Authentication):
       async def authenticate(self, request: Request):
            return schema.AuthenticationToken(
                primary_namespace=username,
                role_bindings={
                    "*/*": ["admin"],
                },
            )

c.CondaStoreServer.authentication_class = MyAuthentication

Developers were responsible for creating a function authenticate(...) which handled the user to role mapping. With this configuration it was possible to conda-store to not know anything about the users.

Proposal

I propose adding one new database table to conda-store. We leverage the fact that the username will match a namespace name which is created on the backend.

# orm.py

# update namespace table
class Namespace(Base):
     ...
     metadata = Column(JSON) # would be nice to rely on metadata so that we don't enforce the attributes that are on a user

class NamespaceRoleMapping(Base):
    id = Column(Integer, primary_key=True)
    namespace_id = ...
    entity = Column(Unicode(255), nullable=False)     # arn e.g. <namespace>/<name> like `quansight-*/*` or `quansight-devops/*`
    role = Column(Unicode(255), nullable=False)  # e.g. viewer

Then add a method to RBACAuthorizationBackend

def get_namespace_role_bindings(namespace: str):
      # query database for all entity role mappings
      return {entity: roles} # will match form in `schema.AuthenticationToken.role_bindings`

Additionally there should be rest api methods and permissions for CRUD. I am least sure about this part.

@pierrotsmnrd I will be gone until Thursday but I think this should be scoped well enough to start. I will periodically be checking github so I should be able to respond to comments.

pierrotsmnrd commented 1 year ago

First step with the DB impacts is here in branch 491-role-mapping

Then I'm kind of unsure how to progress with get_namespace_role_bindings. Whats the best way to get access to the DB from this class that doesn't have access to it ?

costrouc commented 1 year ago

I'd just supply the db session as a function argument. Would that work?

pierrotsmnrd commented 1 year ago

Ok, some parts around the authentication were still blurry but I see clearer now.

I have explored the solution you have described. Let me know if I get this correctly, because I have a slightly different approach to suggest:

  1. My understanding is that all the filter_* functions in class Authentication ( Authentication::filter_builds, Authentication::filter_environments, Authentication::filter_namespaces ), will need to rely on get_namespace_role_bindings, directly or indirectly, to filter the results from the API according to the role bindings.

  2. I noticed that all these functions receive an entity parameter, which is an AuthenticationToken that contains a role_bindings property.

  3. So couldn't we rely on entity.role_bindings instead ? The property would have to be set up when the AuthenticationToken is built. This approach would have less impacts in the codebase (as far as I understand it, but I might be missing something that prevents this approach). An extra feature would be required : updating the role bindings regularly based on the DB.

Let me know what you think :)

pierrotsmnrd commented 1 year ago

2023 07 06 - Sync with Chris :

costrouc commented 1 year ago

Great summary @pierrotsmnrd. After checking the code more I think that several methods will require the db argument.

The more I think about it the RBACAuthorization class just generally needs a db session object injected into it when being created. . It would be difficult to pass the db instance in all methods.

Might be easiest to add after https://github.com/Quansight/conda-store/blob/main/conda-store-server/conda_store_server/server/app.py#L189 the following:

self.authentication._authorization.db = conda_store.db

Probably do in a more official way :smile:

costrouc commented 1 year ago

This issue is partially implmented @pierrotsmnrd. Now we need to add api methods mentioned in this issue.

costrouc commented 1 year ago

I think this issue is fully resolved now that we have merged the PR #508. Will re-open if issues occur.

costrouc commented 1 year ago

Reopening since we need to make get namespace return the api endpoint data.

costrouc commented 1 year ago

This is an important issue for the role mappings in the database to be done properly but is not a priority for JATIC work.

trallard commented 1 year ago

@nkaretnikov assigning this to you

costrouc commented 1 year ago

This is a complex issue that I think is worth talking about over a call with @pierrotsmnrd and I. I'll put the current status here.

I think we slightly missed how this should be implemented.

The entire use case:

I chris would have namespace-role-mapping::create and namespace-role-mapping::delete permissions on a particular namespace say quansight. This means I should have enough permissions to "grant" user pierre a role on the quansight.

We have a database table orm.NamespaceRoleMapping which is used to track these permissions. I think the entity field should be instead be something like target_namespace_id. Meaning user namespace_id has role role on specific namespace target_namespace_id. I don't think that entity helps us here and @pierrotsmnrd sorry for the wrong direction here.

Also I have doubts on us having PUT /namespace/{namespace}/ as the route for updating role_mappings. The more I think about it I think I'd prefer something like an explicit domain e.g. PUT /namespace/{namespace}/role_mappings/.

So I think that most of the hard work is already done but we need to tweak the REST api and fixup how we represent things in the DB. @pierrotsmnrd do you have thoughts? Also we should schedule a call.

nkaretnikov commented 1 year ago

Working on this. I found the discussion here pretty difficult to follow, so I decided to focus on this bit, which is actionable:

I chris would have namespace-role-mapping::create and namespace-role-mapping::delete permissions on a particular namespace say quansight. This means I should have enough permissions to "grant" user pierre a role on the quansight.

It was tricky to understand how this would even look in the UI, so I started from that direction.

I have a UI mockup here: https://github.com/nkaretnikov/conda-store/commits/role-mappings-491

It looks like this:

Screen Shot 2023-09-16 at 20 56 55 Screen Shot 2023-09-16 at 21 01 16

Dropdowns update when you click on them. Delete shows a popup warning before executing. The style is similar to how we manage builds, but I use a table here to keep it aligned. We could add stuff like text field auto-complete later.

We could move it somewhere else, but currently:

The main thing I need to figure out is how role assignment will look like, such that I could actually list some users in this table above. Eve is just a hardcoded placeholder at the moment. IIUC, we don't store usernames, it's all token-based (JWT). I'll update other APIs as needed.

trallard commented 1 year ago

Actually this is the wrong way around we first need to define how the API should look like, followed by the workflow and the UI is the very last thing that needs doing. The UI simply exposed the workflow to the user. This issue should be on hold until we get a chance to brainstorm/agree on the underlying design decisions (API/workflow).

nkaretnikov commented 1 year ago

@trallard There's a huge gap between the discussion above (about entity being a wrong abstraction or the DB attribute idea) and what Chris wrote about being able to share namespace permissions. It wasn't clear to me how those things help (or not) reaching that end goal. I wanted to have a mockup to test real-world UX, to see which handlers are needed for those buttons to work, or what info is needed in the DB to be shown in the UI.

Now I understand constraints better, the UI is a bonus. To me, this highlighted a need to have access to usernames, which we cannot get easily IIUC. You need a readable way to distinguish users, in case you want to update their permissions later or remove them.

OK, I stop work on this and will schedule a discussion.

kcpevey commented 1 year ago

Meeting has been scheduled to discuss this. Blocked until the meeting happens

kcpevey commented 1 year ago

@trallard I wont be in the meeting but I think as we think through these roles, we need to identify some usecases or user profiles:

As a team manager, I create and manage a set of 5 environments for my team to work in. The team can view and use these environments, but they don't have access to edit the environments. Any changes to the environments are requested and vetted by me, then once I update the environment, I notify the team of the changes. However, I am NOT a sys admin and I dont control anything on the backend (or have admin rights on nebari)

As an individual developer, I create and manage my own environments. Sometimes I need to share a notebook with my team. To do this, I create a shared environment in our user group and make sure the notebook can run on that environment. I dont mind if my teammate edits the environment.

As a sys admin, I'm responsible for the backend. I have much the same usecase as the team manager, but I DO have admin rights.

As an intern, I'm very new to conda and environments. I don't need access to edit ANY environments outside of my personal namespace.

Based on what I currently know about the role mappings, this raises a couple of issues:

disclaimer: there are so many issues on this topic, I apologize if I've posted this in the wrong place.

nkaretnikov commented 1 year ago

Note: you don't need to read this message unless you want to provide early feedback on the model.

After talking to people and reading the code, I present my current plan below. I might still make changes to it if I run into implementation issues, but I hope there won't be any.

This is motivated by how we want sharing to work, as well as what we want to have in the UI.

UI mock up (this shows access to current namespace):
alice | admin  [update][delete]   # role here is a dropdown, value is saved when clicking update
bob   | viewer [update][delete]
                       [delete all]

<namespace> <role> [add]          # gives access to this namespace for namespace 'namespace' with role 'role'
This explains how the above UI can be implemented using routes listed below:
# (table entries) = GET    /namespace/{namespace}/roles
# [delete all]    = DELETE /namespace/{namespace}/roles

# [add]        = POST   /namespace/{namespace}/role, params: other_namespace, role
# (GET is not used here, but can be used for partial UI updates - redraw one row)
# [update]     = PUT    /namespace/{namespace}/role, params: other_namespace, role
# [delete]     = DELETE /namespace/{namespace}/role, params: other_namespace
Routes:
These allow reading and deleting everything for convenience:
CREATE                   - no, handled one-by-one by CREATE on /role
READ   {namespace}/roles - lists all role mappings   (used to show all roles in the UI)
UPDATE                   - no, handled one-by-one by UPDATE on /role
DELETE {namespace}/roles - deletes all role mappings (used to delete all roles in the UI)
These always return 1 entry or fail.

To detect potential API errors, I've opted for the strict CRUD model. This helps avoiding additional work in some cases, which would turn everything into one big update method. That is, an error is raised when there's a mismatch. For example, when CREATE attempts to create data that already exists in the DB.

CREATE {namespace}/role, params: other_namespace, role - insert into the DB, error if duplicate (enforced by uniqueness constraint on the table)
READ   {namespace}/role, params: other_namespace       - get role of other_namespace, error if not found
- can be skipped in the UI, but useful for testing
UPDATE {namespace}/role, params: other_namespace, role - search for entry matching (namespace, other_namespace) and update its role, error if not found
DELETE {namespace}/role, params: other_namespace       - delete entry matching (namespace, other_namespace) (return deleted row), error if not found
Table
class NamespaceRoleMapping(Base):
    """Mapping between roles and namespaces"""

    __tablename__ = "namespace_role_mapping"

    id = Column(Integer, primary_key=True)
    # Provides access to this namespace
    namespace_id = Column(Integer, ForeignKey("namespace.id"), nullable=False)
    namespace = relationship(Namespace, back_populates="role_mappings")

    # ... for other namespace
    other_namespace_id = Column(Integer, ForeignKey("namespace.id"), nullable=False)

    # ... with this role, like 'viewer'
    role = Column(Unicode(255), nullable=False)

    @validates("role")
    def validate_role(self, key, role):
        if role not in ["admin", "viewer", "developer"]:
            raise ValueError(f"invalid entity={role}")

        return role

    __table_args__ = (
        # Ensures no duplicates can be added with this combination of fields
        UniqueConstraint('namespace_id', 'other_namespace_id', 'role', name='_uc'),
    )
Comments

I've kept 'namespace' here to keep this table related to Namespace since there it has:

    role_mappings = relationship("NamespaceRoleMapping", back_populates="namespace")

entity is removed from the table.

Also, the current model is stricter and not as flexible as before, but it is less confusing and allows to implement namespace sharing.

In this model, when I have namespace foo and environments there, I give the same level of access to all those environments, depending on role assigned to other_namespace. Need different permissions for some of these environments? Create a new namespace.

We could do it by environment instead, but it would complicate things in the UI and DB. Because then people would ask to share multiple environments together, too. So I think it's simpler to just do it by namespace from the start, and that's it.

Important: this is not compatible with the previous DB model, so my plan is to drop namespace mappings when migrating, which will require users to re-add other namespaces (for sharing, I mean). But I don't think we even use this feature yet, do we?

Also, we have versioned API, but I'm just modifying v1. I don't think we've made any claims about stability of this one yet. I'm also not sure if anyone uses these APIs either.

trallard commented 1 year ago

Because this touches the UI and considering that @smeragoel is our UI/UX expert I'd like her to review and provide feedback on the UX workflows and UI for this.

This will require also identifying where in the UI these settings should be presented and how.

nkaretnikov commented 1 year ago

Status update for PR #607:

costrouc commented 1 year ago

@nkaretnikov has opened a draft PR and needs my review.

nkaretnikov commented 1 year ago

@smeragoel and I had a call about the design part of this. I also talked to @costrouc after that.

nkaretnikov commented 11 months ago

Status update:

smeragoel commented 11 months ago

I’ll get a first draft if the designs ready for review by the end of this week!

smeragoel commented 11 months ago

This is a first draft of the designs required for the workflow described above.

1. Namespace Page

image

  1. This is a completely new page that contains details about a particular namespace.

  2. It is accessed by clicking on the namespace name (Kim Pevey's Namespace) on the left panel. This introduces a new behaviour. Right now, clicking on Namespace Name expands the dropdown on the left panel to show all the environments. According to the proposed design, clicking on it will expand the dropdown and show the namespace page also.

  3. Right now, this screen only contains information about the contained environments and user access, but we can also add other metadata also.

    2. Edit Namespace

    image

  4. This is where you would adjust permissions for the namespace.

3. Creating a new namespace

image

  1. I am having some trouble with this. While the flow is pretty straightforward, I am unsure of the positioning of the Create New Namespace button. Here are some design considerations for the button:

    • It should be in the left panel since it is always on screen and the user can access it from any screen.
    • The button should say Namespace to avoid confusion with the New Environment button.
    • The button should ideally be above Shared Namespaces heading to avoid signalling that you can only created shared namespaces.
    1. With these in mind, I positioned the button as such. I am not 100% happy with the placement, but I think having some fresh eyes on it would help.

image

  1. This is how the create new namespace page would look like, after clicking on the button above. This has very basic fields right now, name and user access, but it can also contain other properties/metadata.
kcpevey commented 11 months ago

@smeragoel these look great! Thank you!

New Namespace button - I think the placement looks good. The label is kind of long for a button, but I agree with you that its necessary to include the word "namespace" so I dont think anything can be done about it.

What is the checkbox next to the trashcan for? - Ohhh now I see that line is what it looks like if you change the role. On the New Namespace page, the changing role highlighting and checkmark doesn't make sense though, right? Since everything will be new?

One thing we might have to think through is adding Users to the new Namespace.

These questions are mixed backend and frontend and I realize they may initiate more work so I want to say that I'm happen to consider these ideas as enhancements once we get the POC out.

pavithraes commented 11 months ago

From meeting today, it'll be intuitive to rename the "developer" role to "editor"

nkaretnikov commented 11 months ago

From meeting today, it'll be intuitive to rename the "developer" role to "editor"

After making this change locally, I realized it introduces too many changes. It requires updating the default role mappings, the docs, the DB validation functions, and the tests (to support both of these roles). The current role mappings PR is already too big. This will be done separately. New issue: #675.

nkaretnikov commented 11 months ago

From the meeting today:

nkaretnikov commented 11 months ago

Migrating data from v1 to v2 role mappings will be done in a separate PR because it can cause issues during migration, which might require manual intervention. So a standalone function will be provided for those who need it. #681

nkaretnikov commented 11 months ago

Status update: asked Chris and Chuck to review PR #607. Also, allocated time for a meeting to answer questions to speed up review process.