nimarezainia commented 4 days ago

:construction: work in progress :construction:

A globally distributed enterprise that operates in many regions will by definition have many data sources spread across these regions. Naturally they will be collecting and storing the data in clusters local to those regions. However when it comes to analysis of that data for Security and Observability, they would rely heavily on cross cluster technologies so that the collected data is seen and operated on singularly (as though they are in a local cluster).

Fleet users, with Elastic Agents deployed in many such regions currently don't have the ability to easily manage their deployment at a global level yet reap the benefits of having their data stored and handled locally. This issue is to track all the requirements for enabling Fleet in a multi-cluster deployment. The goal is to facilitate the deployment of Fleet in a manner shown below:

In this deployment model:

Elastic Agent check-ins are sent to the Management Cluster, where .fleet* system indices are built. This will provide Global Control via Fleet in the Management Cluster.
By utilizing Cross Cluster Search (CCS) dashboards can be built using datastreams from all the remote clusters, thereby providing Global Visibility.
With a Local Data Plane, Integrations Data ingested by the Elastic Agents is stored at the local cluster, avoiding any extra cross regional egress charges and more importantly abiding by local data sovereignty rules.

In this model how do we perform:

(1) Agent Upgrade

The global Fleet UI enables the user to issue the upgrade command.
Actions are curated by the local Fleet Server and sent to individual agents.
Agents then reach out directly to the configured repo to get their artifacts.

(2) Adding Integrations to the Agent Policy

Integrations are added to the policy at the global Fleet level.
Policy will then be curated and utilizing the Fleet Server, it’s distributed to all agents
Agents will enable the input based on that integration
NOTE: local cluster will not install assets and ingest pipelines, this will be an enhancement

(3) Build user dashboards

Utilize CCS to query the datastreams of interest and build user dashboards
Users for the most part can perform this step directly. Can it be optimized?

(4) OSquery

The query is issued via .fleet-actions and the response from the query is read from .fleet-actions-results.
If these indices are available in the management cluster then the operator should be able to perform an OSquery

### Requirements
- [ ] User should be able to nominate which clusters are members of a multi-site deployment.
- [ ] Dataviews to be dynamically modified based on the set of clusters nominated so make the operation of this type of deployment easier.
- [ ] Fleet UI to show which clusters agents are writing data to. Perhaps as a separate/new column (or customizable columns where the user would add the information they are interested in). Allowing for filtering and better UX for users to quickly identify agents in remote clusters.
- [ ] Fleet UI allows filtering based on the cluster.
- [ ] https://github.com/elastic/kibana/issues/187323

cc: @kpollich @cmacknz

elasticmachine commented 4 days ago

Pinging @elastic/fleet (Team:Fleet)

nimarezainia commented 4 days ago

Alternate deployment types that can be supported:

elastic / kibana

[Fleet] Multi Cluster Fleet support - Phase 1: Global Visibility and Control, Local Data Plane #187129

In this deployment model:

In this model how do we perform:

Alternate deployment types that can be supported:

Additional clusters locally

On-prem Fleet Server