Closed zhumo closed 1 year ago
@georgekarrv @gillespi314 @roperzh this is the story you should use for estimation tomorrow. We down-scoped the work.
@georgekarrv @gillespi314 @roperzh hey, quick note here: I updated the requirements to reflect that there should be two separate error messages for a failed "verifying" profile vs. a failed "verified" profile. Hopefully that doesn't change the estimation too much.
Dropping a link to this thread regarding edge cases for future reference and as something to be considered in the context of designing the retry feature.
tl;dr In some cases when MDM is turned on for a host or a host switches teams, unfortunate timing of the osquery detail query may cause a profile to get marked as failed. This can happen if the query runs during the window after the install profile command has been acknowledged by the host (i.e. Fleet status verifying) but before the profile is fully installed on the host. A few factors mitigate the impact of this edge case: In practice, this window should be quite narrow. And if it does occur, it is quite possible that the profiles will be in fact installed and the device is in the desired state even though it appears to be failed in Fleet (something that could be confirmed manually by the admin running a live query).
thanks @gillespi314. In the event that happens, is it the case that when the distributed itnerval runs again, it'll check all expected profiles vs. all seen profiles and then re-mark them as failing or not? So then the second time the distributed interval runs, the profile will be properly marked?
In the event that happens, is it the case that when the distributed interval runs again, it'll check all expected profiles vs. all seen profiles and then re-mark them as failing or not? So then the second time the distributed interval runs, the profile will be properly marked?
As currently implemented, the status won't change once it reaches the failed state. It's something we could potentially implement as an in-between step short of redelivering failed profiles.
Thanks @gillespi314.
the status won't change once it reaches the failed state...we could potentially implement as an in-between step short of redelivering failed profiles.
Got it. I think we'll want to implement something to properly mark the profile.
I added this to the redeliver story so that we have something tracked:
At each distributed interval, check all expected profiles v. all seen profiles and re-mark them as "Failed" or "Verified"
- This means that "Pending" profiles will be moved to "Failed" if they're missing. "Failed" profiles will be moved to "Verified" if they're present.
Does that make sense to you?
cc @zhumo
@noahtalerman Yes, that lines up with what I was thinking too.
Able to get a device into Failed status in case one (profile installed successfully but unable to verify):
Will proceed with attempting to force case two (verified but since found missing).
@gillespi314 in my above comment, I was working with a device that I enrolled & accidentally transferred teams before the verification completed, so I was assuming that was the cause of the failure (possibly related to 12452?). However, I have had two machine in a row reach Failed status—despite successful profile delivery & disk encryption flow—without any intervention during enrollment. Is that something already being accounted for & I should hold off further testing, or does that sound like a new issue?
Testing the second Failed state—previously Verified but found missing—proved to be difficult. Apple has changed the behavior of config profiles to be unremovable. even on manually enrolled devices, with the exception of the MDM enrollment profile. Attempting to remove a custom profile via the UI and the command line both failed, and removing the MDM profile only triggered the re-enroll prompt.
However, @roperzh was able to point me to a command that could be run via fleetctl
that forced a profile removal, and I was able to verify the error message:
This secondary Failed state is unlikely to affect many users, given the difficulty with removal.
C&C: @noahtalerman to check that the profile status docs are updated with this information. https://fleetdm.com/docs/using-fleet/mdm-custom-macos-settings#step-3-confirm-the-setting-is-enforced
check that the profile status docs are updated with this information. https://fleetdm.com/docs/using-fleet/mdm-custom-macos-settings#step-3-confirm-the-setting-is-enforced
This PR to the docs adds this information: #12806
Profiles checked each beat, MacOS fleet now complete, No error, just neat.
Goal
Changes
This issue's estimation includes completing:
Context
QA
Risk assessment
Risk level: Low / High TODO
Risk description: TODO
Automated:
Manual testing steps
Testing notes
Confirmation