Research osquery options to validate profile status and disk encryption status.

georgekarrv commented 1 year ago

Review Parent to discover relevant osquery queries to return success if profiles are correctly present or additional unwanted profiles are present.

Note: sounds like cis and additional state complexity.

mna commented 1 year ago

Notes and findings:

The macos_profiles table is an extension (not part of osquery core), it is generated by running exec.Command("/usr/bin/profiles", "-C", "-o", "stdout-xml") on the host (source: https://github.com/macadmins/osquery-extension/blob/629b146ba671a85fccbcb25f2d9d16535ebf7539/tables/macos_profiles/macos_profiles.go)
Some macos devices do not have this osquery table installed in dogfood, so we should prepare for that scenario in the query (meaning we need a discovery query, I think).
I think we only care about "Configuration" payload types (we validate that in Fleet when adding custom profiles: https://github.com/fleetdm/fleet/blob/main/server/mdm/apple/mobileconfig/mobileconfig.go#L86-L88 , and this is the type of the disk encryption profile that we control), so we can add a WHERE clause type = "Configuration".
Very little information available about the verification_state column, all values seem to be empty on dogfood and the help only mentions "The verification state of the profile." (with an example that it could be set to "verified", whatever that means). The source code doesn't help, it is simply the output of the profiles command, maybe the man pages would tell more (but doesn't look like it here: https://ss64.com/osx/profiles.html).
For the custom settings, I don't see how we can verify that only the required profiles are there, not any superfluous ones. We probably don't prevent custom profiles from being installed by other means, so we don't control all profiles that may be installed? I don't see in the macos_profiles osquery table how we can identify those that are "fleet-controlled". The display name, description, identifier, uuid and organization can be anything AFAICT (for the custom profiles - for disk encryption it is easy as we control the identifier).

Given that, I think the osquery we want is this:

SELECT display_name, identifier, install_date FROM macos_profiles where type = "Configuration";

This assumes the install_date gets updated when a profile is updated (e.g. the content changes for an existing profile identifier). This should be easy enough to verify on a mac (also, the timezone of the timestamp - it's returned with +0000 which would indicate UTC, hopefully this is correct and not just an erroneous result of osquery formatting the profiles output).

With this information, we can match the existing profiles on the host with the expected set (from mdm_apple_configuration_profiles for the host's team) and update host_mdm_apple_profiles accordingly (do we just update the status to Verified or we want to store those osquery results somewhere?).

How I think we could reconcile/check if it is "Verified" is:

match by identifier, if the identifier does not exist in mdm_apple_configuration_profiles then ... we don't know, it may be an extra profile that should be removed, or it may be a profile that was not managed by Fleet that we shouldn't touch (if that's a thing).
If it does match by identifier, then check if the names match, if not then the host's profile is not up to date.
If the name matches, then we have to use a hack since we don't have the profile's hash from osquery - we have to check the install_date vs the mdm_apple_configuration_profiles.updated_at (maybe allow for a few seconds leeway), and if install_date is after the profile's updated_at, assume it is up to date and switch the status to Verified (if it was in Verfying).

Step 3. is hacky, as an older version of the profile could be installed after the profile got updated, and would show up as "Verified" even though it is not the latest version (profile installation is async and the install command is queued with the payload at the time it got queued, it could be unable to reach the host for some time and when it finally can, it would process the queue of profiles to install in sequence). I think that if we switch to Verified only if it was already in Verifying helps alleviate that issue, as we will have noted from the backend that it has been identified as "installed", and the osquery only validates that information.

Otherwise maybe we could leverage the file_lines osquery table to get the actual content of the profiles, but I'm not sure they necessarily live as files on the filesystem once installed? Even so, it would potentially have to return lots of data for that query.

@gillespi314 that's what I have so far! Haven't mentioned much about disk encryption because I think it's basically the same logic, with the additional step of checking if we have the decryption key in our DB, so it's data we already control anyway.

gillespi314 commented 1 year ago

@mna thank you so much for the thorough write up!

This assumes the install_date gets updated when a profile is updated (e.g. the content changes for an existing profile identifier). This should be easy enough to verify on a mac (also, the timezone of the timestamp - it's returned with +0000 which would indicate UTC, hopefully this is correct and not just an erroneous result of osquery formatting the profiles output).

I did some follow up research and was able to confirm:

Install date is indeed UTC and not a formatting error
Install date is updated to the current timestamp when profile content changes via the UI (profile with the same name and identifier is added, deleted, added again with different content)
Install date is updated to the current timestamp when a profile content changes via the CLI (profile with the same name and identifier is added by fleetctl apply via reference to a mobileconfig file path in macos_settings of the YAML, then the content of the mobile config file is changed, and finally fleetctl apply is run again)

For the custom settings, I don't see how we can verify that only the required profiles are there, not any superfluous ones. We probably don't prevent custom profiles from being installed by other means, so we don't control all profiles that may be installed? I don't see in the macos_profiles osquery table how we can identify those that are "fleet-controlled". The display name, description, identifier, uuid and organization can be anything AFAICT (for the custom profiles - for disk encryption it is easy as we control the identifier).

@roperzh and I discussed this and we think the product direction is to treat any unrecognized profile as one that should be removed (i.e. we want to prevent custom profiles from being installed by other means).

Here's what I'm thinking for next steps based on Martin's comments and follow up discussion with Roberto:

We'll use the verification query that Martin suggested:

SELECT display_name, identifier, install_date FROM macos_profiles where type = "Configuration";

"Verified" status means that, for a given profile as of the most recent osquery detail update for a host, the profile data in mdm_apple_configuration_profiles for the host's assigned team (or no team) matches a profile reported by the host.
- A match means that both the name and identifier match and that the install_date timestamp in the osquery results is on or after the updated_at timestamp in mdm_apple_configuration_profiles. There are still some possible races that could occur with the "on or after" test. For example, if an old profile is installed on the device at the same exact second that a profile is updated on Fleet, it would be a match under the "on or after" test.
We focus just on adding the "verified" status for now. Business logic for handling other status possibilities still needs to be specified and can be implemented in follow up tickets.
- If name and identifier match but that the install_date timestamp is too early, we will leave the profile state as "verifying". We'll address how to handle profiles that are stuck in "verifying" (retries, etc.) separately.
- If a previously "verified" profile is missing from the most recent osquery detail update for a host, we'll flip the status to "verifying". We'll address how to handle profiles that are stuck in "verifying" (retries, etc.) separately.
- If the host reports a profile that doesn't have a match in mdm_apple_configuration_profiles, we'll ignore it for now. We'll address how to handle unrecognized profiles (removal, etc.) separately.

Let me know how that sounds to y'all. I'll plan to update the backend implementation ticket with these details.

mna commented 1 year ago

@gillespi314

treat any unrecognized profile as one that should be removed (i.e. we want to prevent custom profiles from being installed by other means)

Gotcha, that's great news as it should simplify things.

There are still some possible races that could occur with the "on or after" test. For example, if an old profile is installed on the device at the same exact second that a profile is updated on Fleet, it would be a match under the "on or after" test.

Yes, and also clock skew since we compare timestamps on different computers (the mysql server and a host in the wild, possibly on different networks). This should be understood as a best-effort check with possible false-positives AND false-negatives (i.e. the host's timestamp could be before the DB's, but the profile could be up-to-date on the host, it's just that its clock is in the past compared to the DB).

For completeness' sake, in your step 3: if a profile is in the host profiles table but is not in "Verifying" and it comes back in the osquery, it is simply ignored/left as-is?

The plan looks great to me! Agree with making those changes in steps and follow-up tickets to keep this manageable, there are lots of details.

gillespi314 commented 1 year ago

For completeness' sake, in your step 3: if a profile is in the host profiles table but is not in "Verifying" and it comes back in the osquery, it is simply ignored/left as-is?

@mna, good question. The simplest would be to simply ignore/leave as-is. Although if the prior status is "pending" and the op type is "install", I think it also could be safe to update to "verified" if osquery comes back with a full match (i.e. name, identifier, and installed "on or after"). This in effect skips the "verifying" stage that otherwise might apply while we wait for the next detail query report from the host. But it should resolve to "verified" soon enough, so there isn't much to be gained by adding complexity. Similarly, we could try to devise something for pending removals and failed operations in general. But I think those need more specification and are probably best addressed in follow-up tickets.

mna commented 1 year ago

@gillespi314 agree with you, ignoring/leaving as-is for now seems reasonable and future work can improve on that.

fleet-release commented 1 year ago

Osquery reveals truth, Profiles secure, encryption, Cloud city stands strong.

fleetdm / fleet

Research osquery options to validate profile status and disk encryption status. #11240