apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
60.6k stars 13.1k forks source link

[SIP-131] Superset Security Model Redesign #28377

Open mistercrunch opened 2 months ago

mistercrunch commented 2 months ago

Motivation

This SIP proposes a better security model for Superset, aimed at simplifying and strengthening the management of permissions across the platform. Our goal is to transition from the current model, which is heavily tied to the Flask App Builder (FAB) framework and its assumptions and limitations, to a more intuitive and scalable system. This new model will reduce complexity for administrators, align permissions with user expectations and the actual architecture of Superset, and enhance performance by streamlining permission checks. It will introduce a clearer, structured approach to defining resources and actions, incorporate a flexible, attribute-based access control (ABAC) system, and lay the groundwork for decoupling from FAB, thus paving the way for a more robust and extensible security framework.

Looking back…

First, let’s analyze the current security model and how it evolved.

Using Flask AppBuilder (FAB) as a foundational framework, we inherited its dynamic security model, which automatically generates one permission for every view method (essentially any method that returns a web response), including those inherited from its base classes. In environments where FAB manages only a few models, this approach is manageable. However, in Superset, which supports a large number of models and custom methods, this model expands to hundreds of permissions, many of which do not meaningfully correspond to the mental model of users or administrators.

In 2020, we undertook efforts to rationalize the number of permissions. @dpgaspar enhanced the BaseView in FAB to "map" a class name to a permission using class_permission_name and method_permission_name, allowing FAB users to associate any given view with a specific permission.

We extensively applied this mapping across Superset, reducing the number of atomic permissions from about 500 to just around 100. This reduction was achieved through a series of PRs, such as https://github.com/apache/superset/pull/12012, and by using mapping logic coded https://github.com/apache/superset/blob/master/superset/constants.py#L122-L170 . Note that these PRs were cautious in reassigning permissions to roles through database migrations.

Issues with the Current Model

Goals

This SIP aims to overhaul Superset’s security model to achieve the following key objectives:

  1. Establish a Sensible and Scalable Permission System: Create a security framework where every permission adheres to a clear and logical pattern, easily understood and managed by users and administrators. This system should simplify permissions without sacrificing the granularity needed for precise access control.
  2. Prevent Permission Sprawl: Implement a robust governance strategy that mandates attaching future methods to a well-defined permission-naming scheme. This will involve strict guidelines for developing new features or modifying existing ones, ensuring that they conform to the established permission architecture.
  3. Facilitate Decoupling from Flask AppBuilder (FAB): Develop strategies to reduce dependency on FAB, paving the way for a centralized policy enforcement mechanism that better aligns with Superset’s specific requirements. This decoupling will allow for greater flexibility and the adoption of more advanced security practices that are currently constrained by FAB’s architecture.
  4. Improve System Performance and User Experience: Enhance the performance of the security model by optimizing how permissions are checked and reducing database query loads. Simultaneously, revamp the UI for role management to make it more intuitive and user-friendly, avoiding the pitfalls of the current system which mixes numerous types of permissions in a confusing and unwieldy manner.
  5. Enable Advanced Security Features: Introduce advanced security capabilities such as attribute-based access control (ABAC) with a flexible, domain-specific language (DSL) that supports dynamic permission queries. This will allow for more sophisticated, context-sensitive security policies that can adapt to complex organizational needs.
  6. Provide a path forward: for everyone migrating to the new model, we want to find a clear path that either guarantees backwards compabiltity and/or forces them to make decisions where required.

Resources, Actions and Subjects

Screenshot 2024-05-06 at 3 58 11 PM

Let’s move from the current approach that dynamically creates individual permission for each [view class_name] and [view.method_name] and towards a clearer, stricter set of clearly defined resources and actions.

Resources

We will define a finite set of resources that align with our information architecture and application structure. These resources represent logical groupings within Apache Superset, reflecting both the UI and underlying data structures. Here are the primary resources identified:

Core entities

Data access entities

Resource ABAC Selectors - A Simple Yet Evolutive DSL

To effectively manage permissions across these resources, we introduce a simple yet powerful Domain-Specific Language (DSL) for defining attribute-based access controls (ABAC). This DSL allows administrators to specify and enforce security policies directly related to the attributes of resources. Here are the key features of our ABAC DSL:

Features:

Set Operations:

This DSL is designed to be easy to use, read, and integrate into our existing systems while being robust enough to handle complex permission scenarios. The use of familiar logical operators and condition structures ensures that policies are both transparent and maintainable.

About serving lists efficiently…

Another important property of this DSL will be around the ability for it to be translated into SQLAlchemy .where() clauses, that can then be translated to SQL. The reason why this is important is because we often need to extract list of resources to a user, and we need the ability to execute these filtering clauses at the database level (as in “show me a list of 50 charts that this user has access to” and paginate through results). The expressions above need to be execute in SQLAlchemy and translated to SQL.

Limiting related attributes

While it’s easy to think about highly specific rules and selectors (as in Dashboard.obscure_property.groups.json.Xor(...) ) We’ll want to clearly state and limit which attributes get expose to the very few that are needed to craft rules. For instance while id.in() is powerful and important, allowing people to create ABAC rules based on obscure attributes that could be mutated over time seems hard to support. Fewer selector are better. Version 1 may just have id.in() for instance, and slowly evolve to support more attributes and operators.

A few intricate-yet-relevant attributes include the concept of “ownership” (is the user one of the owners of the object), or a private vs published status. The former may require specific logic as we’re looking into a many-to-many relationship as it relates to a specific / active user. Maybe simple magic function like Dashboard.@is_owner as opposed to something like. Dashboard.owners.id.includes(@current_user.id)

Data Access Resource

Data Access-related resources in Superset represent the hierarchical and external nature of database elements accessible within the platform. These resources include Databases, Catalogs, Schemas, Relations (tables and views), Columns and Rows. Unlike other entities in Superset, the management of these resources is not governed by typical CRUD operations due to their external management and inherent structural dependencies.

Key Characteristics:

To solve for this I’d like to bring in a special resource called DataDomain that would encompass all this under a single resource that can we used to specify a set of object, or domain as opposed to having 4-5 levels hierarchy of object.

DataDomain becomes a special selector to target a set of tables, a schema, or a whole database. Many DataDomains can be combined in a permission `DataDomain(db=1, schema_match='core.*')

Actions

To ensure a clear and scalable security model, we introduce a hierarchical naming convention for actions using colon-separated words, which supports pattern matching and fine-grained access control. This structure allows us to define actions in a way that makes them intuitive and consistent across different parts of the system.

Hierarchical Structure

The action naming convention is designed to be hierarchical, facilitating both broad and precise permission settings. Actions are segmented into levels, allowing policy-makers to specify permissions at various granularities using wildcards. For example, a write:.* pattern would grant all write-related permissions on a resource, while more specific pattern like write:delete:.* would apply only to deletion operations (write:delete:one, write:delete:bulk , …)

Example Actions:

Structuring the Action Hierarchy

The hierarchy is constructed from a curated dictionary of terms that are clear and relevant to our operations. Each term is carefully chosen to ensure it aligns with common actions within Apache Superset, yet is flexible enough to accommodate unique operational requirements.

The top levels of the hierarchy are strict, and opening up to reach higher atomicity and a higher cardinality of words in the higher levels of the hierarchy

Strict Top-Level Actions:

Strict Second level, for read

Strict second level for mutations under write:

Other Commonly used verbs in actions:

Flexible Terminology for Specific Actions: For actions specific to certain functions or data formats, we use more flexible terminology as the last element of our action strings (e.g., csv, excel). This allows for the easy introduction of more atomic actions that target very specific features where required.

Pattern Matching:

While related attribute matching makes sense for Resources (as specified above) a simpler string matching approach should work for actions

This approach to defining actions enhances the flexibility and clarity of our security model. By using a structured hierarchy and clear terminology, we ensure that permissions are both manageable and transparent, allowing administrators to effectively control access across various parts of Apache Superset.

Notes:

Subject

In the Superset security model redesign, a "Subject" represents an entity directly associated with a user or actor interacting with the system. Subjects play a crucial role in determining access permissions and enforcing security policies. Here's a breakdown of key aspects related to Subjects:

  1. Special Resource Entity: In the context of the security model, a Subject is treated as a special type of "Resource." While traditional resources represent data entities or system components, Subjects specifically pertain to users or actors within the system.
  2. Subject Selectors: Similar to other resources, Subjects are associated with "subject selectors." These selectors define attributes or properties of the user entity and are used to determine access rights and permissions. For example, a subject selector might specify user roles, groups, or individual user identifiers.
  3. Association with Permissions: Subjects are directly associated with permissions in Policy. When defining access controls, administrators specify which actions users (Subjects) are permitted to perform on specific resources. This association allows for granular control over user access and ensures that permissions are accurately enforced.

Permission

In this framework, a Permission is simply a combination of:

# JSON SCHEMA
Permission:
  type: object
  properties:
    name:
      type: string
    description:
      type: string
    resources:
      type: array
      items:
        type: string
    actions:
      type: array
      items:
        type: string
# DRAFT of what a superset_policies.yml could look like

permissions:
  admin_permissions:
    description: Can do EVERYTHING on all RESOURCES
    permissions:
      - resources:
        - .*
        actions:
        - write:.*
        - read:.*
  viewer_permissions:
    description: Can browse and view most non-system, published objects
    permissions:
      - resources:
        - Chart
        - Dashboard
        - Query
        - SavedQuery
        - ...
        actions:
        - read:.*
      # Can't see/view unpublished or private dashboards
      - resources:
        - Dashboard.published.equals(False)
        actions:
        - !read:.*

  creator_owner_permissions:
    description: |
      Can create new objects, and has full power on things they created/own.
      NOTE: making a Viewer person an owner wouldn't bypass the policy as
      it does today
    permissions:
      - resources:
        - Chart
        - Dashboard
        - Query
        - SavedQuery
        - ...
        actions:
        - write.post # so that they can CREATE new objects
      # Full power on resources that they own
      - resources:
        - .*[owns]
        actions:
        - write.*
  full_data_reader_permissions:
    permissions:
      - resources:
        - DataDomain:.*
      permisison:
        - read
  FinanceDataReader:
    permissions:
      - resources:
        - DataDomain:schema=finance
      permisison:
        - read
  FullMinusFinanceDataReader:
    permissions:
      - resources:
        - DataDomain:schema=finance
      permisison:
        - !read

Note that a simple PermissionSet object could be useful for the frontend to use as a quick reference to a set of permissions that cover a certain amount of surface of the UI, so the UI can simply ask for dashboard_related_permission_set and get the atomic permissions as a set in the response.

Policy (combining resource, actions and subjects)

Building upon permissions, a Policy is:


# JSON schema
Policy:
  type: object
  properties:
    name:
      type: string
    description:
      type: string
    permissions:
      type: array
      items:
        $ref: "#/Permission"
   subject_selectors:
      type: array
      items:
        type: string
# DRAFT of what a superset_roles.yml could look like
Roles:
  Admin:
    - policies:
      - Admin
    - groups:
      - ldap.sysops
    - users:
      - joe@preset.io
  Alpha:
    - policies:
      - CreatorOwner
      - Viewer
      - FullDataReader
    - groups:
      # maybe upon importing / syncing ldap group with custom logic,
      # we prefix those to be referenced in places like this roles.yml file
      - ldap.data_scientists
      - ldap.data_engineers
    - users:
      - joe@preset.io
  FinanceTeam:
    - policies:
      - CreatorOwner
      - Viewer
      - FinanceDataReader
    - groups:
      - ldap.finance_team
  BronzeUsers:
    - policies:
      - CreatorOwner
      - Viewer
      - FullMinusFinanceDataReader
    - groups:
      - ldap.finance_team

User Groups and Other User Attributes

In our ongoing efforts, we're introducing a new entity: Groups. These groups serve as straightforward collections of users, with membership becoming a pivotal element of subject selectors. As the framework evolves, we can extend support to more intricate subject selectors. For instance, we could implement selectors based on the domain of a user's email address.

Static and Dynamic Objects

For some uses cases, many of the objects defined above, namely Permissions, Policies, Roles, and UserGroups are best defined fairly statically, as in shouldn’t really change from a deployment to the next. For these we may want to have them largely defined as code and cannot be altered at runtime.

For other use cases, things are required to be more dynamic, to a point where it’s reasonable to think we want for Administrators in the UI to point-and-click and grant access to people to certain things. Those can be altered by users in real-time and should take effect immediately.

That leaves us with a combination of static and dynamic objects that can be combined to in an environment.

Some environments may choose to have few rules defined as code, while other may have a much more dynamic set of rules defined in the UI.

Storage

Ok, so some policies (static) are configs, and some policies are dynamic (stored in the database). In all cases, they are stored as collections of action and resources “patterns” strings. For the policy manager to have a full picture of current policies, it will have to assemble information from these configs and database tables.

Sizing / caching / Performance

In a complex environment, what would be the size of the whole “role/policy book”? Probably megabytes as most, which I believe could by / should be cached and stored in memory of all backend services. Meaning if someone updates a database-stored rules, we 1. update the db, but 2. expire the in-memory cache and force a refresh in all processes.

Alternatively/complementarily the whole policy book could be stored an Redis.

Now if in the user session we know their roles, looking up the roles against the in-memory policy book should be super cheap.

In any case, policy and role membership should be fairly slowly changing and fit nicely in-memory cache.

PolicyManager should be extremely fast at processing assertions. That should be a massive improvement form before where we did a lot of looking up specific perms in database against a fairly large list.

The Playbook

At a high-level, and given the complexity of the project, we recommend bringing the new security model in parallel to the existing one, and running both in parallel for some time prior to deprecation of the old model. Why?

1. Inject Resource and Action semantics

For every single view/method in Superset:

2. Introduce a PolicyManager

Sketching what SecurityManager could look like

class SecurityManager:
    def view_decorator(permissions: list[Permission]):
    """specify and enforce permission requirements for a view"""
    def check_permissions(permissions: list[Permission]) , user): -> bool
    """check that a user has a set of permissions"""
    def apply_resource_filter(query: sa.Query, user): -> sa.Query
    """Given a resource query, appends `.filter()` to limit to user's access rights"""
    def provide_constraints(user, resource_type, user): -> list[Contraints]
    """Given a resource type, returns the list of contrainst for a given user"""

We probably need some sort of PolicyManagerViewMixin or maybe it injects itself in SupersetBaseView , but some constructs to ensure permission semantics at a deep level, maybe it forces-map methods to have permission semantic, prevents unmapped method, and auto-assigns things based on convention. The decorator above is ideal because explicit, but there’s a lot of views we inherit from FAB and therefore need to map to action/resource after the fact.

Conflict resolution: the policy manager knows how to bubble up conflictual rules and/or how to resolve/log conflicts.

Auditing/logging: should be possible to log every assertion (user X asked for access to resources Y to perform action A). Probably need some sort of PolicyLogger hook

3. Introduce a feature flag

Wherever permissions are checked, introduce logic for a feature flag to decided whether the old way or new way should be executed. Maybe there’s a DEBUG mode where we execute both and look for differences, alert where disagreements are found

4. Build a UI

Here’s a google sheet containing an extraction of the current permissions in Superset→ https://docs.google.com/spreadsheets/d/13CQQX5MhhSH99ZsnHlSmSyZeMUcQ6TWhoLAuMI4hAGs/edit#gid=680477257

Screenshot 2024-05-03 at 12 28 50 PM
mistercrunch commented 2 months ago

@dosu can you help summarize this proposal into a few paragraphs?

dosubot[bot] commented 2 months ago

The proposal for the Superset Security Model redesign focuses on creating a more intuitive, scalable, and manageable security framework. Key components include:

  1. Introduction of a Structured Permission System: This aims to simplify the management of permissions, making them more aligned with user expectations and the actual architecture of Superset.

  2. Implementation of a DSL for ABAC: A domain-specific language for attribute-based access control is proposed to allow for more granular and flexible access control policies, enhancing security measures like row-level security (RLS) and guest access.

  3. Reducing Dependency on Flask App Builder (FAB): The redesign seeks to manage permissions, roles, and access checks within Superset more directly, moving away from reliance on FAB's models and methods.

  4. Optimization of Permission Checks: By implementing efficient methods for permission checks, the proposal aims to improve performance and manageability of access control to various resources within Superset.

  5. Adoption of a Hierarchical Naming Convention for Actions: This will help streamline the management of permissions and roles, making the system more intuitive and easier to navigate.

  6. Establishment of a Governance Strategy: A strategy for creating, managing, and syncing roles and permissions is proposed to ensure that access control is both robust and flexible.

  7. Development of a PolicyManager and UI: To facilitate the efficient management of policies and roles, including tools for creating, editing, and assigning roles and permissions, as well as visualizing access control policies.

  8. Enhanced Security for Embedded and Guest Users: The proposal includes measures to ensure that embedded dashboards and guest users have restricted access based on predefined policies, improving overall security.

This summary encapsulates the main goals and strategies of the proposed Superset Security Model redesign, focusing on improving the system's intuitiveness, scalability, performance, and manageability.

#### Details **Sources** - [[SIP-19] Simplify Superset's set of permissions](https://github.com/apache/superset/discussions/18383) - [[WiP][SIP-125] Proposal for Enhanced Data Access Permissions](https://github.com/apache/superset/issues/28002) - [[SIP-126] Fine-grained access control to Superset entities](https://github.com/apache/superset/issues/28021) - [[SIP-89] Integrate role, permission, user APIs on Flask-AppBuilder into Apache Superset](https://github.com/apache/superset/issues/21050) - [[SIP-73] Proposal for Improving the security stucture and move to resource based model](https://github.com/apache/superset/issues/14806) - [superset/security/manager.py](https://github.com/apache/superset/blob/master/superset/security/manager.py) - [docs/docs/security/security.mdx](https://github.com/apache/superset/blob/master/docs/docs/security/security.mdx) - [superset/explore/utils.py](https://github.com/apache/superset/blob/master/superset/explore/utils.py) - [superset-frontend/src/features/rls/RowLevelSecurityModal.tsx](https://github.com/apache/superset/blob/master/superset-frontend/src/features/rls/RowLevelSecurityModal.tsx) - [superset/migrations/versions/2024-02-07_17-13_87d38ad83218_migrate_can_view_and_drill_permission.py](https://github.com/apache/superset/blob/master/superset/migrations/versions/2024-02-07_17-13_87d38ad83218_migrate_can_view_and_drill_permission.py) - [superset/migrations/shared/security_converge.py](https://github.com/apache/superset/blob/master/superset/migrations/shared/security_converge.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

mistercrunch commented 2 months ago

@dosu can you do some research and suggest similar DSLs or frameworks we may want either use or take inspiration from around our DSL and ABAC selecrors?

dosubot[bot] commented 2 months ago

For the DSL and ABAC selectors in the context of Apache Superset's security model redesign, it's beneficial to look into Superset's existing Row Level Security (RLS) feature. This feature already incorporates elements of a domain-specific language (DSL) for attribute-based access control (ABAC), demonstrating the application of complex permission scenarios through conditions like equality, sets, logical operators, and efficient translation into SQLAlchemy .where() clauses. The RLS tests and implementation provide a practical example of how RLS filters can be defined with conditions and applied to SQL queries, showcasing the support for the required features in your ABAC DSL proposal.

The current RLS feature in Superset, as demonstrated in the integration tests, supports defining filters with various conditions, applying these filters based on user roles and dataset-specific rules, and translating these rules efficiently into SQL queries. This aligns with your goals for a DSL that supports equality, sets, negative flips, logical operators, and can be translated into efficient database queries.

Improving upon the current ABAC implementation could involve addressing any existing limitations, such as the complexity of maintaining the system, performance concerns with dynamic permission checks, limited attribute support, and integration challenges with external systems. Enhancing these aspects could lead to a more robust, efficient, and user-friendly ABAC system within Superset.

In summary, leveraging insights from Superset's RLS feature and addressing its limitations could provide a solid foundation for developing a new DSL and ABAC selectors that meet your objectives for a more intuitive, scalable, and manageable security framework.

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

mistercrunch commented 1 month ago

Unsorted NOTEs from 5/14/24:

rusackas commented 3 weeks ago

@mistercrunch are we ready for a [DISCUSS] thread?

mistercrunch commented 3 weeks ago

yes!