[Fleet] Finalize generic support for agentless integration policies in Fleet policy editor

kpollich commented 5 months ago

Follow up on https://github.com/elastic/kibana/issues/180375

In https://github.com/elastic/kibana/issues/180375, Fleet added support for policy variables being shown/hidden based on the deployment_modes value in an integrations and ensured that policies saved for agentless integration policies are always saved with the hardcoded agent policy with ID agentless.

To finalize Fleet's generic support for integrations that opt into the agentless deployment mode, we need to do the following:

[ ] Add a agentless/agent-based selector component when deployment_modes.agentless.enabled: true is set on a given policy template
[ ] Hide/show the "Select or create agent policy" UI based on the state of the agentless selector
[ ] Always create a new agent policy when saving an integration policy in agentless mode. This newly created policy should have supports_agentless: true set.
[ ] When deleting an agentless integration policy, the agentless agent policy should also be deleted
[ ] The toggle and agentless logic should still be gated behind the agentless feature flag in addition to the integration level configuration
[ ] Ensure that even if supports_agentless: true is set, the policy editor default to the current agent-based option and the toggle is set to agent-based
[ ] Ensure that the agentless option is clearly marked as beta, and display a beta banner in the policy editor form itself to reiterate this when agentless mode is enabled
[ ] Editing an agentless integration policy should not be possible. The "edit" action should be disabled with a tooltip directing the user to instead create a new integration policy and delete the existing one.

Open questions

[ ] How should we handle integrations with multiple policy templates like AWS? Should editing the "top-level" integration policy that renders all policy templates permit setting the deployment mode on a per-policy-template level?

elasticmachine commented 5 months ago

Pinging @elastic/fleet (Team:Fleet)

olegsu commented 5 months ago

Should we support multiple installations of the same integration? e.g. will there ever be two CSPM policies, or can we expect that if a user tries to install a second instance of CSPM they won't be able to do this?

Yes, users today have multiple CSPM integrations installed (on multiple agent policies). I don't think we don't want to limit the number of integrations installed to begin with.

kpollich commented 5 months ago

Thanks, @olegsu - could we try to clarify the expected workflow then in the UI? Based on our conversations, I am understanding it as something like

User clicks to Add CSPM (or another agentless integration)
User sees policy editor in "agentless mode" with setup technology UI, no dropdown for selecting an agent policy
User configures CSPM, clicks "Save integration"
A new agent policy (e.g. agentless-1234) is created under the hood, and the integration policy is attached to this agent policy

Some questions I have for the expected state after this flow is completed:

Does the user "see" this new policy we created, or is it hidden from them?
Can the user add additional integrations to this policy in some way?
If the user clicks "Add CSPM" again, would they create another new policy (e.g. agentless-4567) to hold that instance of CSPM?

eyalkraft commented 5 months ago

@kpollich you're on point.

Does the user "see" this new policy we created, or is it hidden from them?

That's actually a product question @tinnytintin10

Can the user add additional integrations to this policy in some way?

I think not.

If the user clicks "Add CSPM" again, would they create another new policy (e.g. agentless-4567) to hold that instance of CSPM?

I think yes.

kfirpeled commented 5 months ago

On serverless, we have a few users that already use agentless, which uses the method of a preconfigured agent policy behind the scenes. Since we have a higher goal to support agentless sooner on ESS, without using the preconfigured method. It might save us time to migrate serverless to the new method later in the process. And first complete ESS.

So on fleet, we should be able to distinguish between these two implementation details.

This is still an open question, and it will be answered more effectively once we have a more detailed action plan and time estimations.

seanrathier commented 5 months ago

@kpollich I am wondering if the create a Kibana Agentless service in the Fleet plugin to create Agentless-ElasticAgents plan item for ESS is mostly a duplicate of this issue.

Assuming that we are doing what @kfirpeled mentioned and concentrating on ESS, the only difference for ESS would be that we are sending requests to the Agentless-API to create the new agentless-agent.

kpollich commented 5 months ago

I don't think this is a duplicate of that issue. This issue seeks to capture the flow in which an integration policy is created for an integration with deployment_modes.agentless.enabled: true. When this integration policy is created, we should also create a new agent policy record as the parent of the integration policy.

Whether this logic is encapsulated in the agentless service you're proposing is perhaps a relevant implementation detail, but actually calling that logic from the UI remains the unit of work captured by this issue.

See proposed workflow in this comment above: https://github.com/elastic/kibana/issues/183863#issuecomment-2122440189

The way this works today, the UI makes several sequential API calls to the /api/fleet API namespace. When writing this issue, I was assuming this would continue to be the case, and we'd have some conditional logic with different calls/parameters based on agentless/non-agentless integration requirements.

seanrathier commented 5 months ago

@kpollich please correct me if I am not summarizing our conversation correctly.

@kpollich and I huddled to discuss if there could be a duplication of effort for this and what we needed to do for the Agentless API.

We agreed that 1 through 8 of the sequence diagram below would likely have overlapping concerns, while 9 is unique for the Agentless API.

We would likely need to move most or all of the Agentless Kibana logic from the CSPM plugin to the Fleet plugin to allow other integrations to use it.

If this issue's solution is to create a new Agentless-agent for each agentless supported integration, the work effort to integrate the Agentless API (ESS) would be considerably smaller.

We were unsure if there were CSPM plugin extension hooks for 10 through 16 and I am currently looking into that.

Omolola-Akinleye commented 4 months ago

@seanrathier @kpollich The toggle will be for Agentless ESS correct? For Serverless, we have a pre-configured agentless policy once we migrate Agentless API to Serverless, then we can show the toggle.

seanrathier commented 4 months ago

@Omolola-Akinleye

The toggle will be for Agentless ESS correct? For Serverless, we have a pre-configured agentless policy once we migrate Agentless API to Serverless, we can show the toggle.

For Serverless, the toggle will still show, however, Agentless is selected by default and will only use the one pre-built Agentless policy. We need to give the user the option to use an Agent-based policy

For ESS, everything is the same as above, except when Agentless is selected it always creates a new agentless policy. You cannot select an existing Agentless policy like in Serverless.

Omolola-Akinleye commented 3 months ago

@seanrathier Ahh okay thanks for clarifying. My concerns introducing toggle in Serverless

Does the fleet team have Serverless Environment setup and running locally to test the toggle?
Broken Serverless FTRS that uses a different UI drop-down instead of toggle.
Does switching toggle on and off causing bug race conditions Agent-based vs Pre-configured Agentless.

kfirpeled commented 3 months ago

It is worth mentioning, that Agentless on Serverless will not be available before migrating it to use Agentless API and making all the flows similar to ESS.

What I miss in the ticket's DOD is in edit mode, that it cannot be switched/toggled. So agentless remains agentless and agent based remains agent-based.

@kpollich , tbh that value is part of the agent policy values. and not the integration. So if that is where this toggle will be visible, the edit mode should not concern us. Is that what you had in mind?

kpollich commented 3 months ago

What I miss in the ticket's DOD is in edit mode, that it cannot be switched/toggled. So agentless remains agentless and agent based remains agent-based.

@kpollich , tbh that value is part of the agent policy values. and not the integration. So if that is where this toggle will be visible, the edit mode should not concern us. Is that what you had in mind?

If agent-based is selected, a new agent policy + integration policy will always be created. Should this be possible at all in the edit mode for an existing integration policy? e.g. if a user clicks "edit" for their Okta policy, which is running in agent-based mode, should they be able to toggle it to agentless mode and have a new agent policy created for them under the hood?

tehilashn commented 3 months ago

@kpollich - this looks great, thank you and your team for pushing this effort!

Given that agentless is going to be in beta for the first few releases we should make sure to:

mark the agentless option a beta
keep the Agent-based option as the default one

We will be doing the same for the CSPM custom UI and have proposed a design for it - https://github.com/elastic/kibana/issues/188601. Can you please make sure to cover this here as well?

kfirpeled commented 3 months ago

Both transitions are feasible.

We haven't captured that work yet on the first stages of our implementation, we were thinking of blocking it for now so we could focus on the creation flow.

It does add some complexity to edit an integration and switch to agentless. Under the hood we need to create a new agentless agent policy when doing so. As for the existing integration, should it be removed from the agent-based policy? should it be duplicated? I'm not quite sure..

kpollich commented 3 months ago

we were thinking of blocking it for now so we could focus on the creation flow.

Let's move forward with this approach. Editing an agentless integration shouldn't be possible, and a new integration policy will need to be created. I'll capture this in the requirements above.

kpollich commented 3 months ago

Updated the description + issue title to better reflect what we're doing here. cc @juliaElastic since you'll be picking this up next sprint as well. Let me know if you have any concerns about the larger list of requirements.

juliaElastic commented 3 months ago

@kpollich A few questions here:

Does this issue meant to change the UI in stateful mode only, leaving the serverless behaviour unchanged? Do we want to show the agentless selector inly in cloud or on-prem also?
Do we have any existing packages that have deployment_modes.agentless.enabled: true set other than CSPM currently? I didn't see any other in elastic/integrations repo.
In serverless, CSPM has a dropdown to select agentless/agent-based mode when creating an integration, this logic is in a custom form for CSPM. Do I understand correctly that we want to add this dropdown on the generic Fleet add integration UI where the package supports agentless?
Do I understand correctly that for now we don't want to allow switching from agent-based to agentless mode in edit integration page?
When the agentless option is selected, do we create the agent policy with the usual Fleet API or do we intend to use the Agentless API?
Do we want to prevent adding more integration policies to an agent policy created with support_agentless: true? From the comments it seems so, but in the description it seems to be allowed.

Ensure that even if supports_agentless: true is set, the policy editor default to the current agent-based option and the toggle is set to agent-based

criamico commented 3 months ago

@juliaElastic I'll answer some of these questions as I worked on the agentless area previously.

Do we have any existing packages that have deployment_modes.agentless.enabled: true set other than CSPM currently? I didn't see any other in elastic/integrations repo.

AFAIK we don't have any. I generated a couple of packages locally with this flag to simulate this behavior. I added these packages it in this PR, but let me know if you cannot download them or if they don't work for you, I should still have them. The testing steps are in the PR description.

Do I understand correctly that we want to add this dropdown on the generic Fleet add integration UI where the package supports agentless?

Yes that's the idea, there was a lot of discussion on this but basically we want to be able to show a "reduced" version of that UI for a generic agentless integration, as the current form is custom-built for CSPM and is closely tied to its inputs.

Do we want to prevent adding more integration policies to an agent policy created with support_agentless: true?

I think that the assumption has been since the beginning that an "agentless" policy will only allow a single integration policy. I don't know if we want to put in place any explicit guards to prevent additional integrations policies from installing, but the relation should be 1:1

Also, related to this item in the description:

Always create a new agent policy when saving an integration policy in agentless mode. This newly created policy should have supports_agentless: true set.

I believe that this already happens, but we need to verify it.

I'll leave the other answers to @kpollich as I'm not sure what was decided about those.

kpollich commented 3 months ago

Thanks, @criamico - your answers above look good.

Does this issue meant to change the UI in stateful mode only, leaving the serverless behaviour unchanged? Do we want to show the agentless selector inly in cloud or on-prem also?

The UI should be changed in both stateful and serverless. Users should be able to decide whether they want agentless or agent based deployment regardless of their environment.

Do I understand correctly that for now we don't want to allow switching from agent-based to agentless mode in edit integration page?

Yes we should prevent toggling an existing policy between deployment modes.

When the agentless option is selected, do we create the agent policy with the usual Fleet API or do we intend to use the Agentless API?

We'll just use the usual Fleet API - this doesn't need to interact with the agentless API at all for now. @seanrathier if I understand correctly we might change this later such that saving a new agentless policy would create and deploy an agentless agent via the agentless API (a mouthful!)

For now, so long as we are creating an integration policy with the proper variables set that belongs to a new agent policy with supports_agentless: true, Fleet's job is complete here.

Do we want to prevent adding more integration policies to an agent policy created with support_agentless: true? From the comments it seems so, but in the description it seems to be allowed.

Yes, Cristina is correct this should not be possible. The goal is to have a single integration policy on each agent policy with supports_agentless: true. This will provide the clearest scaling model for the agentless controller to create larger agents or enroll more agents based on that integration's ingest needs.

juliaElastic commented 3 months ago

@kpollich I had a few more questions.

While implementing this, I see there is a lot of behavior now added for CSPM integration: setting agentless mode as default, calling agentless API to create agentless agent when creating agentless policy, only allow agentless selector if env is serverless/cloud with agentless api set. Do we want to keep the logic as-is for CSPM, and only implement the generic behavior for new non-CSPM packages supporting agentless? If all packages should work the same, I would have to remove a lot of logic recently added, not sure if this is what we want.

Also currently agentless policies are created as managed policies, which helps to prevent adding more integration policies. However managed policies don't allow deleting integration policies, so I think we have to either remove the managed property from the created agentless agent policies or let these integration policies be deleted even if the agent policy is managed.

How should we handle integrations with multiple policy templates like AWS? Should editing the "top-level" integration policy that renders all policy templates permit setting the deployment mode on a per-policy-template level?

I'm not sure how this could work, the agentless/agent-based selector determines whether the created agent policy will have support_agentless flag or not. I think the safest option would be to only allow the selector if all policy templates support agentless in the current integration to be added.

Editing an agentless integration policy should not be possible. The "edit" action should be disabled with a tooltip directing the user to instead create a new integration policy and delete the existing one.

I started implementing this, and realised if we disable the edit action, users won't be able to view the current config of the integration policy (only in Agent policy / View policy as yaml). Alternatively we could allow going to the Edit integration page, and only disable the submit button.

seanrathier commented 3 months ago

While implementing this, I see there is a lot of behavior now added for CSPM integration: setting agentless mode as default, calling agentless API to create agentless agent when creating agentless policy, only allow agentless selector if env is serverless/cloud with agentless api set.

@juliaElastic @kpollich, I'm chiming in here for your knowledge. Most of the work we have completed and planned for the Agentless API (ESS) has been in the Fleet plugin, we want to avoid "customizations" for CSPM and make agentless available to all integrations.

We have not been making any changes in the CSPM plugin aside from the UI changes to Setup Technology Selector, so that should be moved to the Fleet plugin whenever possible, and that might improve the performance of the page if we can avoid multiple React state renders.

juliaElastic commented 3 months ago

Does that mean that the agentless API logic should be called for all integrations, not just CSPM? And that the behavior described in this issue should be the same for all packages, in both serverless and cloud? (supposing agentless is not supported in on-prem deployments). In serverless there was a hardcoded agent policy with id agentless, are we getting rid of that too?

I can try to move out the setup technology selector out of the CSPM extension view to render with the generic logic, I added locally to the form like this, without the Advanced options accordion, it looked out of place to be hidden under the accordion.

seanrathier commented 3 months ago

Does that mean that the agentless API logic should be called for all integrations, not just CSPM?

@juliaElastic, AFAIK, eventually all integrations that have the supports_agentless property and the user chooses Agentless option in the dropdown it should make the request to the Agentless API to create an agentless-agent.

In serverless there was a hardcoded agent policy with id agentless, are we getting rid of that too?

We have a task in our plan to migrate Serverless to use the Agentless API solution which will create a new Agentless-agent, instead of using the hard-coded agentless policy.

elastic / kibana

[Fleet] Finalize generic support for agentless integration policies in Fleet policy editor #183863

Open questions