fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.01k stars 419 forks source link

GitOps: Remove teams #16677

Closed getvictor closed 4 months ago

getvictor commented 8 months ago

Goal

User story
As an endpoint operator using Fleet's best practice GitOps,
I want to remove the teams that aren't defined in my git repo
so that the teams in Fleet reflect the teams in my git repo.

Context

Currently, best practice GitOps doesn't remove teams that are created in the UI.

After these changes are released to prod, we can update https://github.com/fleetdm/fleet-gitops to use the new switches: https://github.com/fleetdm/fleet/issues/18692

Product

Engineering

ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

Context

QA

Risk assessment

Manual testing steps

Reference YAML configs can be seen at https://github.com/fleetdm/fleet-gitops

Test the new GitOps switches:

For this part of the feature:

Test cases (using fleetctl gitops)

Testing notes

Confirmation

  1. [ ] Engineer (@getvictor): Added comment to user story confirming successful completion of QA.
  2. [ ] QA (@____): Added comment to user story confirming successful completion of QA.
noahtalerman commented 7 months ago

Heads up @getvictor, this feature request was brought to feature fest on 2024-02-15 and wasn't prioritized for the current design sprint.

I think we'll get to it after we start dogfooding.

noahtalerman commented 6 months ago

Hey @getvictor, heads up, we brought this into the upcoming design sprint (4.49).

noahtalerman commented 6 months ago

One specific setting of interest is apple_bm_default_team -- the team must exist before this setting is applied. The new flow should be able to create the team (if needed) and then apply the setting.

Hey @getvictor, I pulled this change out of the user story.

If I'm understanding correctly, currently we error if the apple_bm_default_team doesn't exist.

When we add "GitOps: Dry run before merge" (#17687) the user will see an error in this case and add the missing team or correct a typo if the team already exists.

I'm not sure we want to take this on because we don't know if the user is missing the team or made a typo.

sharon-fdm commented 6 months ago

Estimation assumes Fleetctl does not have team handling. Since Gitops actions are based on Fleetctl, we will need to add that to fleetctl.

getvictor commented 6 months ago

@noahtalerman Is removing teams just done in the GitHub action or a core functionality of fleetctl gitops command?

If core, we can accomplish this by allowing all relevant files to be passed in at once, like:

fleetctl gitops -f default.yml -f teams/team1.yml -f teams/team2.yml --delete-other-teams

The above is easier to implement and seems like a cleaner approach.

If part of the action, then the action needs to track what has been done. This can be done by writing to a file during the previous gitops commands, then reading all teams via fleetctl get teams, and deleting the difference via fleetctl delete.

noahtalerman commented 6 months ago

@getvictor, I think we should make it core.

That way, the teams behavior is consistent w/ queries, policies, and config profiles: full replace.

One benefit to doing it as part of the action is that users could tweak the behavior if they wanted to. However, they can't do this for queries, policies, and config profiles.

So I think let's be consistent and make it core to fleetctl gitops.

getvictor commented 4 months ago

@noahtalerman There is a situation that I wanted you to be aware.

We delete the teams last, after applying configs to the existing/new teams. If the user wants to reuse an enroll secret for the existing/new team, GitOps will fail. That is because the secret is still held by the team to be deleted.

noahtalerman commented 4 months ago

@getvictor why not delete teams first?

If I'm understanding correctly, here's an example workflow in which the user will trigger a failure:

Is that right? I think that's ok if we can't avoid it.

Any other items that would trigger a failure like this? Other items that must be unique across teams.

getvictor commented 4 months ago

Two reasons for deleting teams last

noahtalerman commented 4 months ago

We process teams 1 at a time, so only after we process them all do we know which ones not to delete

Makes sense.

What if the flow fails while applying team configs -- do we want the other teams to be already deleted at this point?

GitOps best practice always does a dry run though right?

@getvictor I'm onboard and trust your decision. Just trying to get a good understanding of how it works.

getvictor commented 4 months ago

@noahtalerman We should just fix dry run to catch this issue. I filed a bug: https://github.com/fleetdm/fleet/issues/19152

fleet-release commented 4 months ago

Syncing teams with Git, Reflecting changes in cloud, Fleet showing the way.