fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.11k stars 427 forks source link

Configuration profile not removed after host was offline for extended period of time #22121

Closed ddribeiro closed 1 month ago

ddribeiro commented 1 month ago

Fleet version: 4.56


💥  Actual behavior

A macOS host that had a configuration profile installed by Fleet was offline for an extended period of time. During this time, the configuration profile was removed from Fleet. Fleet correctly sent the RemoveProfile command, but the host was not online to receive it. The RemoveProfile command expired in APNs before the host came back online. Fleet did not re-send the RemoveProfile command,

🧑‍💻  Steps to reproduce

  1. Upload a .mobileconfig profile to Fleet and wait for your test host to receive it.
  2. With the host offline, remove the profile from Fleet.
  3. After an indeterminate amount of time, when the APNs command to remove the profile expires, bring the host back online. Observe that Fleet does not attempt to re-send the command to remove the profile.

🕯️ More info (optional)

Due to the lengthy and undefined amount of time it takes for an APNs command to expire, this one is likely going to be difficult to reproduce.

To fix this behavior, the customer had to:

  1. Create a manual label with only the affected host
  2. Re-deploy the profile to that label (even though the profile already existed on the host locally)
  3. Delete the profile from Fleet to trigger a new RemoveProfile command and remove it from the host.

Alternatively:

  1. Manually build a RemoveProfile command containing the identifier of the profile to be removed and send it to the Fleet API using the Run MDM command endpoint.
JoStableford commented 1 month ago

Related to a Slack conversation

PezHub commented 1 month ago

QA Notes:

I was able to reproduce the issue by uploading a profile to a team then quickly deleting it before it gets installed on the host. Fleet sends the remove command to the host without the profile installed so the uninstall fails.

Note: I was able to use a similar workaround as the customer to remove it = upload the profile again, allow it to install on the host and make sure it has time to verify. Then delete it.

Video shows the bug and workaround. We will pull into our mdm board so the engineers can take a look.

PezHub commented 1 month ago

possibly related #21891

PezHub commented 1 month ago

We're currently removing profiles that never made it to the host in a cron job

@jahzielv Sounds like this fix you applied for 21891 should solve this so I will close it out unless you disagree?

jahzielv commented 1 month ago

@PezHub I'm not sure, I hadn't seen this one before. I can try to reproduce this and see if the fix works here.

PezHub commented 1 month ago

Thanks, I can certainly help test. Seems like the cron job would clean this up but we'll see 🤞

PezHub commented 1 month ago

The cronjob adjustment made as part of a fix for issue #21891 should resolve this as well so closing it out.

fleet-release commented 1 month ago

Offline device roams, Fleet's command lost in clouds. Return, profile's gone.