[Fleet] Known limitations with Fleet/Integrations in multi-space environments

kpollich commented 7 months ago

Summary

The goal of this issue is to provide the following:

A unified place to discuss the usage of Fleet/Integrations in multi-space environments
Historical and technical context around Fleet's existing limitations in multi-space environments
A rough sense of where we're going with multi-space support in Fleet
A list of references to past issues, long discussion threads, etc for further context
Answers to some Frequently Asked Questions (FAQs) about Fleet and multiple spaces

Stakeholders

@elastic/fleet
Team Lead: @kpollich
Tech Lead: @nchaulet
Product Lead: @nimarezainia

Summarizing the state of Fleet in multiple spaces today

I might say "Fleet" below, but in general this will mean "Fleet and Integrations" - they are separate apps within Kibana, though their backing data models are closely related.

Today, Elastic's documentation mentions that integrations may not function as expected across multiple spaces, but we don't explicitly lay out any known limitations or issues we've come across.

We've got a great set of detailed issues captured in https://github.com/elastic/integrations/issues/3434 that I'll distill into a few core ideas below, including relevant historical and technical context.

All of Fleet's data models are global

This idea is a generalization of the point in the issue above about agent policies, integrations, etc being visible or editable by all users in all spaces so long as they access to Fleet.

All of Fleet's saved object types were designated with namespaceType: 'agnostic' in https://github.com/elastic/kibana/pull/64360 back in 2020. There are some great docs about what this namespaceType value does, and what options are available to Kibana plugins. to summarize:

agnostic: Global saved objects
single: Single-space saved objects, unique ID per space
multiple-isolated: Single-space, but with a globally unique ID
multiple: Multiple spaces, globally unique ID

Of these available namespace types, multiple-isolated and multiple were introduced in 7.12.0 which released on March 23, 2021. So, during Fleet's initial implementation and subsequent push towards GA in 7.14.0 (August 3, 2021) these multi-space saved object options simply did not exist. So, rather than trying to overhaul Fleet's global data models in a span of 2 patch releases, Fleet elected to move forward with using global saved objects for its entire data model.

This means that Fleet does not leverage Kibana spaces for any kind of RBAC, as they're intended. For example, a user in Customer Space A can access all policies, even those belonging to Customer B in some kind of multi-tenant environment. This limits Fleet's use cases and feasibility for users who leverage Kibana in a service-provider capacity, or who want to implement any kind of multi-tenancy. It also limits customers who want to enforced granular permissions and access control to parts of their infrastructure. Today, it's impossible for an "operator" of Fleet to grant access only to "security" focused agents/policies to one set of users while granting access only to "observability" focused agents/policies to another set of users.

This also means Kibana-level guidelines like this great documentation on securing access to Kibana are largely irrelevant to Fleet, which harms our users who expend effort to train and onboard themselves onto Kibana's best practices for security only to find it doesn't apply to Fleet.

Integration assets can only be installed in a single space

There are also many instances of technical issues with the global nature of Fleet's data models, especially related to integrations. For instance, Fleet's installation saved object is global, but Kibana assets (like dashboards and saved searches) are not - they're limited to a single space today. This means we often see confusion when an integration is considered installed (Fleet's saved object tracks this status field), but none of its assets are available in the current space.

Fleet's recommended workflow to get around the single space nature of these assets today is to use the "copy" function for dashboards and other Kibana assets. This workaround however, has drawbacks, as integration links (e.g. the link to a hostname for the agent metrics overview dashboard) will break when the dashboard is copied to another space.

Here are some real-world issues we've seen related to integrations in multiple spaces:

There are countless other support cases in this realm that don't manifest in public issues, as well. However, we won't link to those private support cases in this public issue.

The root-level remedy for these issues with integrations is to make all Kibana assets shareable across multiple spaces, then to have Fleet support flagging integration assets as available in multiple spaces. The Application Experience team at Elastic is hard at work on https://github.com/elastic/kibana/issues/167901 which will enable all Kibana analytics assets related to dashboards to be shared across multiple spaces.

A possible stopgap solution that the Fleet team could implement while the epic above is being worked on is to allow for installing assets in additional spaces, and duplicating all assets across those spaces. These assets would still be tracked in an integration's installed_kibana references array, so they'd be considered managed and would be tracked across package updates, etc. This is tracked in https://github.com/elastic/kibana/issues/172963.

Where are we headed?

The Fleet team is actively planning and prioritizing a large scope of work to make Fleet fully space-aware, including granular permissions (think "read-only access to agent policies" and "write access to Fleet settings") and UX improvements to make space considerations top-of-mind for Fleet operators. This project is in the planning phase today, but is rapidly approaching being ready for implementation.

Almost all implementation issues for this project will be public. We value working in public heavily at Elastic, and this project will be no exception. We tend to do planning in private for cases like this, to avoid publicly setting expectations inappropriately while projects or features are not yet fully defined, and to allow us to conduct UX research, user interviews, etc to inform our planning efforts.

Once there is a concrete implementation plan for the space-aware Fleet project, this issue will be updated with links and references to those public issues.

What do we need to do to get there?

At a high level, Fleet needs to execute on the following (not the final implementation plan, just what we're thinking of right now):

Convert all Fleet saved objects to namespaceType: 'multiple' to allow them to exist in multiple spaces
Implement the concept of an "admin space" to allow Fleet operators to view all their Fleet data across all user spaces to support service provider/multi-tenant use cases
Support agent management across many spaces, including actions, diagnostics uploads, log-level changes, agent policy updates, etc
Support integration management across many spaces, including installation, updates, deploying new integration to policies ("upgrades"), and asset management
Implement granular permissions to various Fleet resources including but not limited to agent policies, integrations, and global Fleet settings (outputs, Fleet Server Hosts, proxies, etc)
Add UX elements to assist operators in assigning Fleet objects to one or more spaces
Extensive documentation updates, new tutorials, etc

We'll also need to work closely with the various teams and products at Elastic that integrate with and depend upon Fleet including

Elastic Defend
Synthetics
APM
Cloud Security Posture Management
All teams that maintain one or more integrations

This is just a high-level summary of where we are so far with our planning. There are many technical details to define and risks to assess currently. We are working hard to move this project towards the implementation phase. Thank you for your patience as we take this on!

mbudge commented 7 months ago

All the work for Fleet and Kibana spaces sounds awesome.

My one concern is we will have deployed Elastic-Agent to 1000's of endpoints by the time multi-space support is releases.

I think we've set up 20-30 policies in 2-3 Kibana spaces while we've been deploying Elastic-Agent. I don't want integrations to become locked to a space when they become space aware, as it will become difficult to maintain.

Can you please make sure we can move Fleet policies/integrations to 1 Kibana space without having to re-enroll Elastic-Agent when this becomes available?

Thanks

user-987654321 commented 6 months ago

I've been waiting for this feature!!! Thanks

kpollich commented 4 months ago

Hi all, I wanted to provide an update on where we're at with the work to improve the multi-space story around Fleet and Integrations in Kibana.

The team has drafted an internal RFC and gotten broad approval from all the teams at Elastic who consume Fleet's API's (e.g. Elastic Defend, Synthetics, APM, Cloud Security Posture Management, and others). So we've begun to make progress on a two-phased approach to resolving these multi-space issues.

Phase 1

The first part of our effort here is to introduce some granularity to Fleet's permission model. Today, users are left with a binary choice of "all or nothing" for their users/roles with access to Fleet. This means that users can't grant partial or read-only access to the Fleet plugin. This is a particularly large deal breaker for our users who operate in "service provider" environments where they grant Kibana/Fleet access to their own customers. In order to make multi-space support a meaningful feature for these use cases, we needed to add more granularity to Fleet's permissions as well. This was a much more approachable chunk of work, and can be delivered separately from the broader space-affinity migration that needs to happen, so we opted to land this first.

Big thanks to @nchaulet for his work on this initial phase. Please see a selection of the PR's related to this work below for reference:

Phase 2

This is the big one: actually migrating Fleet's saved object models to be "aware" of spaces.

The actual mechanism to make this change is, at its core, updating the namespaceType from agnostic to multiple-isolated for Fleet's core saved objects. See https://www.elastic.co/guide/en/kibana/current/sharing-saved-objects.html for some documentation on what the different namespaceType settings for saved objects mean.

There are many more considerations to make in addition to the saved object migration, for instance

Fleet relies heavily on system indices for policy changes, actions, and action results. The documents we write to these indices need to include some kind of space identifier so we can control access to various pieces of the Fleet application appropriately
Installing integrations in multiple spaces (e.g. easily duplicating assets into multiple spaces + tracking them with integration upgrades) is a related-but-separate effort that can be done once Installation objects are made space-aware
Elasticsearch objects like ingest pipelines and index/component templates have no concept of space awareness, so we need to either change Fleet's naming schema for these pieces of data to include a space identifier or limit access in other ways. This won't be something we resolve as part of this scope, and we will likely move forward with a recommendation to prevent "fleet users" from directly editing these objects in stack management or via the ES API. Solving "stack wide RBAC" is not something the Fleet team can handle on its own.
Changing Fleet's own namespace approach to include a space identifier as a prefix to user-provided namespaces, allowing us to segregate ingested data by space. This has cascading effects throughout Fleet, and we'll need to ensure integration assets (dashboards) can account for this change

I've got it on my plate in the coming days to get started with proof of concept for the space migration effort, and to work closely with the Kibana core team to ensure we're applying space-aware patterns to Fleet's data model appropriately. See issue here: https://github.com/elastic/kibana/issues/180708

Once that proof of concept is done, we'll create public tracking issues for the rest of the phase 2 work I detailed above. These issues will be cross-referenced here, so please look out for those in the coming weeks.

This project is slotted for delivery sometime in Q2 of this calendar year, and we're on track for that timeline right now. We'll continue to provide updates here as that work continues, including any kind of blockers or set backs. While we can't work 100% in public, we definitely try to do whatever we can in public to remain transparent with our open source community, and this will continue as we move forward on this large scope of work.

Thanks, all!

zx8086 commented 4 months ago

This one is heavily needed as we have separate spaces for different teams working in different domains that have similar services in AWS that we would like all to have access to the the same dashboards.

nicpenning commented 4 months ago

This is great, would we be so fortunate so know what versions some of these features will start to exist? (ie: 8.14, 8.15) and if this phased approach will be split across 1 or many versions? Mostly curious on when we can begin testing, even with snapshot versions of the stack to get feel and expectations of the feature set.

It is great how much attention to detail lives in this RBAC for Fleet adventure.

nimarezainia commented 4 months ago

@nicpenning as you can see bulk of the work for phase 1 has been merged. Currently we are hiding them behind a feature flag so that we can do our due diligence and go through many different permutations of these permissions. What you will see for Fleet are the more granular permissions as shown below:

I will reach out to at a later point to have you look at these changes once we are satisfied (i know you were particularly asking for agent activities to abide by these roles) . As far as delivery is concerned, right now we want to make sure QA has had ample time to test this knowing that the main benefit of the feature will come in Phase 2 which is also underway. As mentioned above the bulk of this work is being lined up for calendar Q2.

nicpenning commented 4 months ago

Excellent, thank you for elaborating on this. That screenshot says a lot! Exciting times ahead. I will continue to be patient and be at the ready as needed for testing.

kpollich commented 3 months ago

Hey all, another update from me! We're making progress on "phase two" of the Fleet space awareness project, namely tackling the actual migration of Fleet's core data models from their current agnostic space affinity to the single space affinity. I'm going to just dump a list of our implementation issues for this phase. These are tracked internally on our end in a meta issue/epic, but I want to make sure this public issue gets all the GitHub reference goodness along the way:

I'm sure there will be more issues created along the way as we encounter bumps in the road, but this should be a good start to track our overall progress in this public issue. You can see we've already closed a few these foundational issues, and we'll continue to tackle the implementation + definition of the rest in the coming months.

nicpenning commented 3 months ago

Thank you for the update, @kpollich!

syepes commented 3 months ago

Great news that there is some progress, this is a real pain point in multi space environments. Can't wait to see this live!

mbudge commented 3 months ago

Quick question.

Will Fleet support for spaces allow Cloud Security Posture dashboard/findings work outside the Default space?

We don't want to give IT and Security users access to the Default space because it's a mess.

We grant security analysts access to the Cloud Security Posture dashboard in the "Security" space and IT users in the "Observability" space. At the moment the Security > Cloud Security Posture dashboard links to the CIS Benchmark Findings page only work in the Default space.

kpollich commented 3 months ago

Will Fleet support for spaces allow Cloud Security Posture dashboard/findings work outside the Default space?

Yes the Fleet work should allow the CSPM UIs to work outside the default space, but we'll need to work with @elastic/cloud-security to make some changes to adopt the various new space aware features around Fleet's API's under the hood.

I've chatted with @kfirpeled over on that team a few times and I know they've explored what needs to be done a bit internally so far. Perhaps he will have something more specific to add here 🙂

nicpenning commented 1 month ago

Any updates here? I see more and more work being done, just curious on how things are shaping up and if there are any feature flags we can enable for testing.

nimarezainia commented 1 month ago

@nicpenning there's a lot of working being done in and around this. We will start sharing more info soon. If you are keen to test some of this work for yourself please reach out. Thanks for your patience.

mbudge commented 1 month ago

Quick question

We install and manage integrations in an Engineering space.

We use a different space for security alerting called the SIEM space. Security alerts are enabled by going to the "Rules" and "Add New Rules" pages under Security.

Will the Rules and Add New Rules pages be able to see the integrations are available/installed, if this is done through the Engineering space?

Just wondering how new Fleet permissions are going to impact the security rules.

kpollich commented 2 weeks ago

Quick question

We install and manage integrations in an Engineering space.

We use a different space for security alerting called the SIEM space. Security alerts are enabled by going to the "Rules" and "Add New Rules" pages under Security.

Will the Rules and Add New Rules pages be able to see the integrations are available/installed, if this is done through the Engineering space?

Just wondering how new Fleet permissions are going to impact the security rules.

@pzl - Does this fall into the conversation we had with @nchaulet the other day about security solution work related to the Fleet space awareness push? Not sure what your take on this is.

nchaulet commented 2 weeks ago

Will the Rules and Add New Rules pages be able to see the integrations are available/installed, if this is done through the Engineering space?

As integrations contains global assets (pipelines, templates, ...) they are still global and not space specific, so in a space you will see all installed integrations globally

kpollich commented 2 weeks ago

To provide another update, @nchaulet recently shipped a large PR https://github.com/elastic/kibana/pull/189387 that allows Fleet data to be shared across multiple spaces simultaneously. We had originally scoped this effort out using a single-space affinity model, but that broke down as we discussed the project more with real world users. This multi-space model adds more complexity, and necessitates an opt-in migration to move the data around behind the scenes, but overall lands the feature in a more usable state for more users.

We've also landed https://github.com/elastic/kibana/issues/182733 which provides an efficient and frictionless UX for moving Fleet data between spaces as an admin/operator user. Moving Fleet data to your spaces starts with the agent policy through this tool, and all "child objects" of that policy like integration policies, enrollment tokens, etc will come along with it as a single operation.

We're rapidly nearing the part of this project where we'll be getting things production ready. We've got a few more things to close out before we're ready to get this into folks' hands more publicly, though, e.g.

Plus manual testing, addressing bugs surfaced during our internal QA, and other cleanup of course.

Thanks for following along and providing feedback/comments along the way.

elastic / kibana