fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.01k stars 418 forks source link

Send commands to delete profiles when a host is removed from the UI #11114

Closed roperzh closed 1 year ago

roperzh commented 1 year ago

Fleet version: 4.30.0

Operating system: (e.g. macOS 11.2.3)

🧑‍💻  Expected behavior

When a host is deleted from the UI, all MDM profiles are removed until the host re-enrolls again in its given team.

💥  Actual behavior

Locally (in the host) the host ends up with profiles from both the team in which it was enrolled, and its new team.

More info

To repro (from Reed)

  1. install host (no team)
  2. enroll into MDM
  3. add profile A to No Team
  4. confirm profiles are received
  5. create Team Blue and add a profile
  6. move host to Team Blue
  7. confirm profiles are accurate (they are)
  8. remove host from Fleet
  9. host reappers in fleet as part of No Team
  10. has No Team and Team Blue profiles
  11. After moving host back to Team Blue correct profile state is achieved.
mna commented 1 year ago

My understanding of this issue is as follows:

Up until step 8, everything in the repro steps works as expected, then:

  1. remove host from Fleet

This is an actual deletion of the host in the hosts table (it calls the DELETE /api/latest/fleet/hosts/{id} endpoint), which also take care of deleting related child entries (including, relevant in this case, host_mdm_apple_profiles).

Then when the host re-enrolls (as part of No Team, presumably because that's what its enroll secret links it to), it gets the "No Team" profiles because from Fleet's point of view, those are the only unsynced profiles (the profile reconciliation looks at host_mdm_apple_profiles vs what the host's team/no team should have, and the host has none since this is a brand new host entry for Fleet - it doesn't know about any existing profiles it may already have installed - and needs the profiles of "No Team").

Given that the host needs to be removed from Fleet ASAP (I presume) when deleting it via the UI/API, the only thing we can do, I believe, is a best-effort synchronous removal of existing profiles (i.e. running the MDM RemoveProfile commands immediately, not queued in nanomdm) and hope the host is online and able to process those commands immediately.

Another option would be to queue the deletion of the host altogether, running that asynchronously and removing it from Fleet only once all cleanup tasks are done (including removal of profiles), but that's a much more involved solution (although it would be more general and possibly helpful for other things in the future). It would require a soft-deletion of some kind so that the host does not show up anymore during that time.

For the bugfix, I'm tentatively gonna go for the sync removal of profiles, but the async removal of hosts is something to keep in mind if we have more similar needs (or want better guarantees of profiles removal) in the future.

/cc @roperzh

roperzh commented 1 year ago

@mna that's a great, accurate description of the issue. I want to make a note:

Given that the host needs to be removed from Fleet ASAP (I presume) when deleting it via the UI/API, the only thing we can do, I believe, is a best-effort synchronous removal of existing profiles (i.e. running the MDM RemoveProfile commands immediately, not queued in nanomdm) and hope the host is online and able to process those commands immediately.

To my knowledge, we can't do this easily, and moreover, since the rows in the nano tables are deleted when the host entry is removed, I don't think you'll get to delete any profiles on time.

I'm thinking, would it be possible to defer this since #9780 should take care of this edge case? from that issue:

  • Periodically get a list of all profiles on a host. Maybe we can use macadmins extension (profiles table). If not, make the necessary changes
  • Compare the list of profiles to the list of profiles that Fleet thinks should be installed.
mna commented 1 year ago

@roperzh Thanks! #9780 seems like the better way to achieve this, by regularly monitoring and ensuring the right set of profiles are applied. As it stands, I think that ticket is more to ensure that the expected profiles are present (by updating the status to "Verified" or "Failed"), but it would make sense IMO to build on this to also report (or remove directly?) those that should not be present. Tricky thing might be that there are some non-fleet profiles we should leave alone, and probably a ton of other details, but still at a high level that makes a lot of sense to me to take care of this with that ticket/approach.

/cc @noahtalerman if you agree we could close this ticket or link it to the other one, so that it is addressed as part of the #9780 solution.

noahtalerman commented 1 year ago

@roperzh @mna this is a great issue. Thanks for documenting the current behavior.

I think we can close this issue because the repo steps don't align with the IT's expected workflows. Specifically, we'll only recommend that the IT admin delete the host after they erase it (need to document this).

Reusing a host workflow (when an employee leaves and a new employee is hired):

  1. IT admin erases the host w/o deleting the host in Fleet, turning MDM off, or removing the host from ABM

  2. Assign the host to a new team in Fleet (if necessary)

  3. New employee (end user) sets up the host

  4. Host re-enrolls to Fleet and receives the correct profiles

Retiring a host workflow (end of life):

  1. IT admin erases the host and removes the host from ABM

  2. IT admin deletes the host in Fleet

  3. Host never re-enrolls to Fleet even if someone sets up the host again

Does this make sense? More crucially, do these steps work as I described?

roperzh commented 1 year ago

@noahtalerman thanks! yes, I believe all the steps you described work without problem.

noahtalerman commented 1 year ago

Ok 👍

@xpkoala when you get the chance, can you please test these steps documented above?

If the expected behavior in the steps match the actual behavior, can you please close this issue?

mna commented 1 year ago

I unassigned myself and assigned Reed, and moved this to Awaiting QA.

xpkoala commented 1 year ago

I was able to confirm the behavior for scenario one. Scenario two doesn't have the user connecting a host back to Fleet, so no testing is necessary for that path.

fleet-release commented 1 year ago

Hosts removed, set free, Profiles cleared, renewed path, Fleet adapts with ease.

fleet-release commented 1 year ago

Host removed, profiles Gone, yet re-enroll ensures Team's fresh harmony