SovereignCloudStack / issues

This repository is used for issues that are cross-repository or not bound to a specific repository.
https://github.com/orgs/SovereignCloudStack/projects/6
2 stars 1 forks source link

Define list of useful SCS-standardized roles in OpenStack #396

Open garloff opened 10 months ago

garloff commented 10 months ago

As a IaaS user of SCS-compatible clouds, I want a number of standard roles to be available on every SCS-compatible IaaS environment which serve my typical needs and which are the same (especially from a security & privacy analysis point of view) on all these clouds.

These could be an "admin" role (not available to users, just operators), "domain-manager", "project-member", "read-only", "auditor". These would be global (all services) inside a project (or domain for domain-manager). Maybe read-only vs autitor distrinction is not useful ... Ensure that this is a hierarchy that can easily be understood and analysed from a security point of view. Work has been done upstream (RBAC work in Yoga) on this.

Tasks:

Definition of Ready:

Definition of Done:

reqa commented 8 months ago

As Markus said, there's also the upstream manager role (ETA 2024), but that is on project scope and would co-exist in the future in addition to the domain-manager.

As an upshot for me, I think we need to have the dimension of scope for this role matrix.

markus-hentsch commented 8 months ago

Upstream work

I gathered the following upstream resources in regards to role definitions.

Keystone

Roles: admin, reader, member, service

https://docs.openstack.org/keystone/latest/admin/service-api-protection.html

Barbican

Roles: admin, creator, observer, audit

https://docs.openstack.org/barbican/train/admin/access_control.html

Other

"Consistent and Secure RBAC" track

https://governance.openstack.org/tc/goals/selected/consistent-and-secure-rbac.html

Policy concepts in Nova

State in Nova: https://docs.openstack.org/nova/latest/configuration/policy-concepts.html


Side note: policy config 101

https://docs.openstack.org/cinder/ussuri/configuration/block-storage/policy-config-HOWTO.html

markus-hentsch commented 8 months ago

Barbican

Roles: admin, creator, observer, audit

https://docs.openstack.org/barbican/train/admin/access_control.html

Important update from the Keystone PTG session from 2023-10-25: These Barbican-specific roles are superseded by the "Consistent and Secure RBAC" track and its roles according to the Keystone team.

This makes work on this standard substantially easier, because upstream will also deploy a consistent set of roles across all services which we can build upon.

markus-hentsch commented 7 months ago

I'm unfortunately unable to finish the Role Standard in its current state. The reason being that the behavior between the OpenStack services regarding the new RBAC defaults^1 can differ greatly. I'm unable to formulate a consistent policy configuration that works across all of the components for the intended goal.

Since there is no defined list of supported and allowed OpenStack services in SCS yet, the standard cannot rely on delivering and maintaining full policy configuration templates - there are just too many services. Also, writing SCS-specific and gapless policy templates for each and every possible OpenStack service would put an unfeasible development, testing and maintenance burden on SCS.

Instead I opted to describe general guidelines in the standard draft on how to work with the OpenStack defaults in order to provide a consistent role model across SCS infrastructures, i.e. not modifying the default permission sets of OpenStack and properly preserving them when making adjustments as the CSP. The standard originally intended to incorporate the stable parts of the RBAC rework^1, leaving the rest for future inclusion. Isolating the "stable parts" seems to be difficult due to inconsistent behavior across OpenStack services.

The standard intended to provide a way to generate default policy files using OpenStack tools, define them as the gold standard and then describe on how to make individual adjustments if necessary. However, due to the ongoing RBAC rework^1 two oslo.policy configuration options greatly influence how and which policies are used: enforce_new_defaults and enforce_scope. Furthermore, generated default policy templates using OpenStack tools (oslopolicy-policy-generator) do not reflect the actual default policies (if no policy.yaml file is present) at times depending on the oslo.policy options. This makes it incredibly hard to establish a process and guidelines for consistent defaults a CSP can safely refer to in any case.

Below are just a few small examples of the various effects these settings can have and how they conflict across services. Finding a configuration and strict guidelines for the Role Standard that ensure consistent behavior and compatibility across the whole SCS would involve a lot more research and testing.

General:

Barbican:

Keystone:

josephineSei commented 5 months ago

I am currently reading into the secure RBAC work on upstream. So far I am concerned about the scopes especially:

Even though system scope is currently postponed (or even cancelled?) the project scope may change the behavior of the cloud. And the system scope will either break a lot of operator workflows or will need an incredible large amount of time to be implemented.

The standard should definitely describe this. But we may also need an upgrade path for CSPs and users, if we change the standard in this point.

josephineSei commented 5 months ago

Standardizing the roles in OpenStack is an important topic. Reading through all the current progress upstream, the problems @markus-hentsch mentioned, are still and will be most likely still in place for 2024.1. The tracking of progress shows, that Cinder does not yet fully obey the new personas (project-member and project-reader)^1.

Having not yet a wide spread consensus over the openstack-projects about those personas contradicts using the new policies right now.

But as Markus also mentioned, There are side-effects, when enabling the new policies (e.g. it breaks the domain manager role as described in the same-named standard). As the old policies do not seem to be deprecated right now.^2 We should be safe to use them for another year.

There are two more things to consider:

  1. The newly introduced service role is used by the services themself to communicate with other services. imho we should enable them as soon as a group of openstack services the scs deems complete have implemented them and are able to deal with this new role.
  2. While setting a standard for roles is important, another thing that is necessary is the Assignment of Roles. E.g. we should permit that the admin role can be assigned to anyone outside of a group of operators from the CSP. The service role as well should not be assigned to human beings.

I would like to add those two parts to the standard.

josephineSei commented 4 months ago

We discussed the "network rbac create" option, that lets non-admin users share security groups and networks with other projects as a default policy. Now I am thinking about adding an phrase to this standard to integrate some SCS-defined policy-optiones. It should lead to a website or document or something like this, were we should be gathering all potential security enhancements in policy, we would like to see in an scs-deployment.

So all this standard would need to to is to point at this document and state, that those changes should be integrated.

@bitkeks would that be possible way to go? If yes, the question remains, where the adjusted policy-options will be available.

bitkeks commented 4 months ago

@bitkeks would that be possible way to go?

That works for me! Either apply the policy patch directly in IaaS layer and explain it in the docs, or recommend the patch in the docs. Somewhere where configuration and/or security is the main focus. Release notes as well, if the patch is applied onto the release artifacts.

reqa commented 3 months ago

Currently pending on #528

markus-hentsch commented 2 months ago

Currently pending on #528

Now that we are close to finalizing the list of mandatory and supported services in #528, I looked at each applicable service and gathered which roles are currently established based on documentation and source code of the services.

List of roles

Default^1:

Barbican^3:

(note: Barbican's special roles are planned to be superseded by the RBAC rework^4 and replaced by the standard reader, member etc. however the special roles are currently still included as deprecated^5; using the reworked roles requires enforce_new_defaults=True in Barbican)

Ceilometer/Swift:

Octavia^5 (default configuration):

EDIT (2024-05-14): according to Octavia's documentation, Octavia ships optional alternative policy files, which scrap the Octavia-specific roles and align it with the default reader, member, admin scheme:

Heat:

Services that don't introduce additional roles:

markus-hentsch commented 2 months ago

heat_stack_user [...] important note: Keystone's Application Credential API might make it possible to shed this role from a user and open up malicious activity - this needs to be investigated!

I added Heat to my DevStack in order to check this potential loophole.

OpenStack Heat special role behavior

I created a basic Heat template to deploy a Keystone user in order to verify the role behavior.

Heat template:

heat_template_version: 2015_04_30

resources:
  heat_created_project:
    type: OS::Keystone::Project
    properties:
      description: "A project created by Heat"
      domain: "Default"
      enabled: true
      name: "heat_created_project"
  heat_created_user:
    type: OS::Keystone::User
    properties:
      description: "A user created by Heat"
      domain: "Default"
      enabled: true
      name: "heat_created_user"
      password: "notsosecret"
  role_assignment:
    type: OS::Keystone::UserRoleAssignment
    properties:
      user: { get_resource: heat_created_user }
      roles:
      - project: { get_resource: heat_created_project }
        role: member

Then deploying the template as admin:

source openrc admin admin
openstack stack create -t heat_template.yml demo-heat-stack

According to both documentation and implementation^1, Heat should add the heat_stack_user role regardless of any other role assignments in the template. However, I fail to observe this behavior:

openstack role assignment list --names --user heat_created_user
+--------+---------------------------+-------+---------------------------+--------+--------+-----------+
| Role   | User                      | Group | Project                   | Domain | System | Inherited |
+--------+---------------------------+-------+---------------------------+--------+--------+-----------+
| member | heat_created_user@Default |       | heat_created_project@Defa |        |        | False     |
|        |                           |       | ult                       |        |        |           |
+--------+---------------------------+-------+---------------------------+--------+--------+-----------+

... it only creates the role assignment as per template. Furthermore, even though logging is set to debug level I fail to observe any log messages from the implementation^1.

Something is amiss here. I will look into this more and update this comment accordingly.

Update (2024-05-03):

I misinterpreted the documentation and code. The heat_stack_user role is not added to users created using the OS::Keystone::User, which is an explicit user creation using a Heat template. Instead, this role is only attached to Keystone users created by Heat implicitly for other resources that require a Keystone user for some operation.

One example is the user_data_format property of the OS::Nova::Server resource. If it is set to the string value SOFTWARE_CONFIG, a user is created by Heat automatically for API access to retrieve config data.

resources:
  sample_server:
    type: OS::Nova::Server
    properties:
      flavor: "m1.tiny"
      image: "cirros-0.6.2-x86_64-disk"
      name: "heat-created-server"
      user_data: "bogus"
      user_data_format: "SOFTWARE_CONFIG"

This results in a technical user account which only possesses the heat_stack_user role within the Heat-stack-specific project:

+-----------------+----------------------+-------+-----------------------+--------+--------+-----------+
| Role            | User                 | Group | Project               | Domain | System | Inherited |
+-----------------+----------------------+-------+-----------------------+--------+--------+-----------+
| heat_stack_user | server-stack-sample_ |       | 80ccc0ce195c4a608b8ea |        |        | False     |
|                 | server-              |       | be5c903b6ae-87d7a182- |        |        |           |
|                 | 7vepqwax4nfp@heat    |       | a19f-478c-8f2d-       |        |        |           |
|                 |                      |       | 80be769@heat          |        |        |           |
+-----------------+----------------------+-------+-----------------------+--------+--------+-----------+

Since this user account does not possess a regular role (e.g. member) additionally, any potential role shedding through Keystone's Application Credentials API is not possible and not a problem.

Conclusion

The heat_stack_user role is not used for users created by the OS::Keystone::User Heat resources. It is only attached to users not explicitly specified in the template but created implicitly by Heat as a requirement to realize other resources of the template. Such technical users are only assigned the heat_stack_user role within the stack-specific project (Heat creates one for each Heat stack). They do not possess any other roles and are not part of any other project outside of the stack-specific one - role shedding is not a threat in this case.

markus-hentsch commented 2 months ago

Considerations for the standardization

Due to the large amount of services and roles involved by now, I think the standard should try to keep it as simple as possible:

  1. do not diverge from upstream naming conventions unless absolutely necessary
  2. offer simple & easy default configuration, try taking oslopolicy-policy-generator outputs and code defaults into consideration
    • implement any changes as "diffs" to the generated/implemented defaults, avoid creating hundreds of lines of policy files (maintenance hell!)
  3. document an overview over all resulting roles, their purpose and usage scope in SCS clouds

As a result, I need to find a solution to the dissonance between implemented and generated defaults (i.e. oslopolicy-policy-generator) observed in https://github.com/SovereignCloudStack/issues/issues/396#issuecomment-1852491416 first in order to properly address point number 2.

Update (2024-05-07):

Here are some observations about oslopolicy-sample-generator and oslopolicy-policy-generator I'm noticing while trying to figure out a consistent and reliable way to generate policy files that match the in-code policies 100%:

markus-hentsch commented 2 months ago

Update to the above: I was finally able to cleanly differentiate between in-code policy definitions and overrides by mimicking the crucial part of oslopolicy-policy-generator^1 while changing the behavior to not discard in-code defaults that have overrides but actually output them separately:

from oslo_policy import generator, policy

DEPRECATED_RULES = False
NAMESPACE = "keystone"

def ruledefault_to_yaml_entry(ruledefault):
    return generator._format_rule_default_yaml(
        ruledefault, include_help=False, comment_rule=False,
        add_deprecated_rules=DEPRECATED_RULES
    )

enforcer = generator._get_enforcer(NAMESPACE)
enforcer.load_rules()

file_rules = [policy.RuleDefault(name, default.check_str)
              for name, default in enforcer.file_rules.items()]
registered_rules = [policy.RuleDefault(name, default.check_str)
                    for name, default in enforcer.registered_rules.items()]

file_rules_out = open("file_rules.yaml", 'w')
registered_rules_out = open("registered_rules.yaml", 'w')

file_rules_out.writelines(
    [ruledefault_to_yaml_entry(df) for df in file_rules]
)
registered_rules_out.writelines(
    [ruledefault_to_yaml_entry(df) for df in registered_rules]
)

The resulting registered_rules.yaml will contain the complete set of unmodified in-code policy defaults. It uses oslo.policy's functions directly without too much custom shenanigans, so it should be pretty portable and could be the base for a small SCS tooling that can help generating universal defaults.

Now that I have the untainted defaults, I will check whether the discrepancies observed in https://github.com/SovereignCloudStack/issues/issues/396#issuecomment-1852491416 can be demystified and solved.

markus-hentsch commented 1 month ago

Solving the policy mismatch mystery

I finally found the root cause of the observed unexpected default API policy behavior described in https://github.com/SovereignCloudStack/issues/issues/396#issuecomment-1852491416

The documentation of the oslopolicy-policy-generator^1 states (bold emphasis added by me):

The oslopolicy-policy-generator command can be used to generate a policy file that shows the effective policy in use.

... and this statement seems to be wrong.

As I uncovered below, it actually ignores deprecated policy rules even if they are actively still in use. This makes it diverge from the respective API's actual configuration and hallucinates a set of policy rules not matching the API's observed behavior. This is the case for most policy rules related to the ongoing RBAC rework^2 where not everything is the default yet and a lot of deprecated rules are still in place.

Barbican on DevStack without policy file:

file /etc/barbican/policy.yaml
    # /etc/barbican/policy.yaml: cannot open `/etc/barbican/policy.yaml' (No such file or directory)

source openrc admin admin
openstack secret list
    # (no output)
echo $?
    # 0

Authenticated with the admin role enables the listing of secrets.

Barbican on DevStack with oslo.policy-generated policy file:

/opt/stack/data/venv/bin/oslopolicy-policy-generator \
    --namespace barbican --output-file /etc/barbican/policy.yaml
file /etc/barbican/policy.yaml
    # /etc/barbican/policy.yaml: ASCII text

sudo systemctl restart devstack@barbican-svc.service

source openrc admin admin
openstack secret list
    # 4xx Client error: Forbidden: Secret(s) retrieval attempt not allowed - please review your user/project privileges
    # Forbidden: Secret(s) retrieval attempt not allowed - please review your user/project privileges
echo $?
    # 1

Since the new policy defaults strictly require the scoped authentication with the member role, the access is denied to the user logged in solely with the system-scoped admin role.

Proper policy generation

Using the code from https://github.com/SovereignCloudStack/issues/issues/396#issuecomment-2100910579, the behavior can be adjusted to include deprecated rules instead:

use_deprecated = True

enforcer = generator._get_enforcer(namespace)
enforcer.load_rules()

registered_rules = []
for name, default in enforcer.registered_rules.items():
    if use_deprecated and default.deprecated_rule:
        rule = default.deprecated_rule.check_str
    else:
        rule = default.check_str
    registered_rules.append(policy.RuleDefault(name, rule))

Using this code I can generate a policy file from scratch that now seems to match the actual API behavior when the API is not using any policy file.

markus-hentsch commented 1 month ago

I implemented a small toolkit for batch-generating default API policy files for all (currently) SCS-mandatory and -supported APIs accordingly here: https://gist.github.com/markus-hentsch/54adc0bd5bc7c5799199bf11bf1b8abf

markus-hentsch commented 1 month ago

Upon closer inspection Octavia seems to support alternative policy defaults that omit the Octavia-specific roles. I've updated https://github.com/SovereignCloudStack/issues/issues/396#issuecomment-2090773890 accordingly.

If there's no critical loss of functionality or other downsides, it would be beneficial to the simplicity of the standard to align Octavia with the other roles, discarding the Octavia-specific ones. This would however make SCS behave differently than default Octavia in terms of role assignment processes.

markus-hentsch commented 1 month ago

I closed the existing PR and started writing a new standard draft from scratch in SovereignCloudStack/standards#590 based on all the new findings documented above.

Important note about role reworks and compatibility

As described in https://github.com/SovereignCloudStack/issues/issues/396#issuecomment-2090773890 there are a bunch of special roles in both Barbican and Octavia. As of now, both services support alternative configurations to make use of the generic reader, member and admin roles instead with the following restrictions:

That means for both Barbican and Octavia to get rid of the service-specific role set, the enforce_new_defaults must be enabled. However, this breaks compatibility with orchestration services such as Heat currently^4. Since Heat is part of the SCS supported services now, we cannot break compatibility with it.

I've documented this in the design considerations sections of the new standard draft.

It seems that for now we have to keep the service-specific roles of both Octavia and Barbican which sadly bloats our list of roles.

markus-hentsch commented 1 month ago

For the standard to be useful we need some way to verify conformance to it.

Possible Conformance Tests

Without admin rights and from the outside:

Problems with that approach:


With admin rights or from the inside:

markus-hentsch commented 1 month ago

While implementing conformance tests I found a way to make sure that a test is executed with only the member role using limited role inheritance via application credentials. While implementing the corresponding Key Manager API tests I found and reported a bug in OpenStack SDK: https://bugs.launchpad.net/openstacksdk/+bug/2066045

markus-hentsch commented 1 month ago

Important statement/update^1 from upstream (Nova):

closed this for nova as we dont plan to allow scoping admin ever.

the way to "fix this" currently based on the secure and consistent rbac community goal is i belive the manager role.

For Nova that means a project-scoped admin token should be able to list and delete any server in any project.

markus-hentsch commented 1 week ago

Current adoption of enforce_scope and enforce_new_defaults

Thankfully, Josephine noticed that my assumption about none of the OpenStack services adopting the new RBAC defaults was incorrect because although oslo.policy still defaults them to disabled, some services override these defaults and already enable them.

I looked at the current source code repositories (2024.2) of all services that are currently considered either mandatory or supported as per #587 and classified their adoption of the new oslo.policy options (enforce_scope/enforce_new_defaults):

Service Uses new options? Comment
Nova Fully Link
Neutron Fully Link
Glance Fully Link
Barbican Fully Link
Ironic Fully Link
Manila Fully Link
Heat Yes* Link, *Options were known to break Heat, current state unknown
Cinder No, but supported Link
Keystone No, but supported Link
Placement No, but supported Link
Octavia No, but supported Link
Designate No, but supported Link
Swift No No mentions in code
CloudKitty No No mentions in code
Masakari No No mentions in code
Gnocchi No No mentions in code, not part of opendev anymore, adoption late or unlikely
Ceilometer Not Applicable Has no API

Note that oslo.policy still defaults to disabling those options currently. This means that any service that does not explicitly force-enable those will not use them per default.

I already corrected my erronous statements in the standard PR but I will need to revisit the whole standard in regards to deciding for or against the new options and resulting role models.

markus-hentsch commented 1 week ago

Current state of Secure RBAC regarding Heat and roadmap

Adding to the above I checked the actual state of Heat compatibility and roadmap plans in regards to Secure RBAC (enforce_scope/enforce_new_defaults).

Backstory

The "Consistent and Secure Default RBAC" TC upstream page^1 says:

It breaks OpenStack existing NFV use case and orchestration tooling:

When the deployment project started consuming the nova new policy defaults with scope enabled, we got to know that the scope enable will break heat (orchestration tooling), Tacker (NFV deployment) users or any operators script interacting all the OpenStack interfaces with administrators user.

Heat ‘create stack’ API uses the user credentials (admin) to create project and system-level resources in backend services. For example, it creates project users in keystone (system level resource), flavors in nova (system level resource), servers in nova (project level resource), and networks in neutron (project level resource). If we enable the scope checking in services, then the user calling heat ‘create stack’ APIs which are scoped to either project (existing way) or system (if we change that) will not be able to call the system and project scoped APIs on the service side.

This mailing list message from 2022 summarized the problem pretty well and described the possible solutions.

A Zed release Etherpad discussed a few solutions back in April 2022:

  • Proposed solution
    • Heat accetps stack API with system scope
      • This means a stack with system resources would require system admin role => Need to check with services relying on Heat
    • Heat assigns a project-scope role to the requester during processing stack operation, and use this project scope credential to maange project resources
    • (from TC discussion): Heat start accepting the new header accpeting the extra token (say SYSTEM_TOKEN) and use that to create/intract the system level resource like create flavor.

From the above information, it is still unclear what the current state of Heat is regarding this problem now in 2024.

Current State (as of 2024.2 in development)

DevStack
Heat

The following template in Heat works as admin even with enforce_scope/enforce_new_defaults:

heat_template_version: 2015-10-15

resources:
  # Keystone user, a system-level resource
  heat_created_user:
    type: OS::Keystone::User
    properties:
      description: "A user created by Heat"
      domain: "Default"
      enabled: true
      name: "heat_created_user"
      password: "notsosecret"
  # Keystone project, a system-level resource
  heat_created_project:
    type: OS::Keystone::Project
    properties:
      name: "heat_created_project"
  # Cinder volume, a project-level resource
  heat_created_volume:
    type: OS::Cinder::Volume
    properties:
      image: "cirros-0.6.2-x86_64-disk"
      name: "heat_created_volume"
      size: 1

(tested on a DevStack based on current master checkouts from 2024-07-01)

Global Adoption
Conclusion

It initially seemed like roadblocks led to the upstream Secure RBAC implementation being on hold for the time being. Despite the upstream status page on the Secure RBAC implementation^1 still mentioning the unresolved problems and the postponing of of the new RBAC scoping implementation and defaults, the main issues seem largely solved and the 2024/2025 release roadmaps^3 clearly indicate that implementation and adoption is moving forward again.

Judging from merged fixes^2 and some basic testing, the core Heat issues seem to be solved by now. This should automatically fix Tacker: "Enabling scope checking also breaks Tacker (NFV Orchestration service) deployment as they use heat ‘create stack’ to build OpenStack infrastructure."^1. With this, no critical technical regressions seem to remain.

Regarding SCS-specific implementations:

Only the Domain Manager approach as currently implemented downstream via policies as per SCS Domain Manager Standard will be affected and not work with the new Secure RBAC. When the 2025.2 release lands and removes the enforce_scope toggle, the policy-based implementation will cease to function. Hopefully, upstreaming the Domain Manager functionality will be concluded by then, which will supersede the downstream policy-based implementation.

josephineSei commented 1 week ago

Thank you for writing all of this down. To me this makes me wondering, whether we want to have a role standard right now. It seems to me that a lot of changes are still coming up soon. Additionally with Swift, Cloudkitty and Masakari not even supporting it right now, i have another question: how do scoped tokens behave when being used with non-scoped APIs? Do they just check the role? (Would make sense)

But in the end, when the enforcing of the new version is coming in the next release as well as we are trying to bring the domain manager in for the next release - maybe we should wait for that to happen before we decide here. We may discuss this in the IaaS meeting, but I think focusing on the upstream domain-scoped manager may be a better way to go first.