fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.01k stars 418 forks source link

Allow Orbit enrollment using serial number #9124

Closed roperzh closed 1 year ago

roperzh commented 1 year ago

Context, after https://github.com/fleetdm/fleet/pull/9065:

  1. When a host is assigned in ABM to Fleet's MDM, we are notified through the sync endpoint, and we create a row in the hosts table that only has the hardware_serial column populated (because that's the only info we have available.)
  2. Currently, orbit enrollment is done by matching the hardware uuid of the host.

If a DEP device is being migrated from another MDM solution into Fleet's MDM we will recommend the user to:

  1. Install orbt in the host
  2. Assign the host to Fleet in ABM

But, if the IT admins assigns the host to ABM first, we won't be able to match the host. To account for that case, we need orbit to send the device serial number along with the hardware uuid in POST /api/fleet/orbit/enroll so we can match the host in the database.

Tasks

noahtalerman commented 1 year ago

could happen when a client is migrating from their MDM solution and somehow installs orbit in the device before the device is properly enrolled into MDM.

@roperzh are there problems with this flow? Or, am I misinterpreting your message.

For manually enrolled hosts, adding hosts (installing fleetd) before turning on (enrolling) MDM in Fleet is what we're planning on documenting. The migration docs are here: https://docs.google.com/document/d/1Lzx35dYFWFPTUYWGWYXfvGCLHcy2QX1Uuy64izBPa78/edit

roperzh commented 1 year ago

@noahtalerman this problem is for DEP enrolled hosts, but by looking at the docs: yes we'll need to fix this before the MVP release, otherwise we won't be able to match these hosts in the fleet server.

I updated the description to make this clear. Let me know if it makes sense to you.

lukeheath commented 1 year ago

Please add your planning poker estimate with Zenhub @roperzh

noahtalerman commented 1 year ago

From this issue's description:

If a DEP device is being migrated from another MDM solution into Fleet's MDM we will recommend the user to:

  1. Assign the host to Fleet in ABM
  2. Install orbit in the host

Currently, the migrations docs (See Automatically enrolled (DEP) hosts section) say the opposite. Add host first and assign host in ABM second.

@roperzh will the documented^ steps not work?

roperzh commented 1 year ago

@noahtalerman good catch, the documented steps will work whether we do this or not, the work described here is to account for the edge case of the IT admin assigning the host to ABM first.

Do you think we can punt on this?

Edit: I have updated the issue description accordingly

noahtalerman commented 1 year ago

Do you think we can punt on this?

@roperzh hmm, I'm not sure yet. If we can't match the host, how does this impact the IT admin? Will this affect what profiles (and apps) are sent to the host?

roperzh commented 1 year ago

@noahtalerman sorry for the delayed response here. If we're not able to match the host, the IT admin will see an extra "ghost" host in the UI that only has a serial number.

noahtalerman commented 1 year ago

@roperzh no worries!

if the IT admins assigns the host to ABM first, we won't be able to match the host

In this case, will the "real" host still receive profiles and apps? How much more work will it be to take this on later?

If yes, real host will still receive profiles, and the amount of work later is the same, I think we can punt on this issue.

Just for my understanding, when will the real host receive profiles and apps in this case. Is it right after the IT admin assigns the host to ABM? Or is it after orbit is installed?

roperzh commented 1 year ago

In this case, will the "real" host still receive profiles and apps?

yes, it would!

How much more work will it be to take this on later?

as far as I can tell, it won't make any difference, but take that with a grain of salt as things change rapidly

Just for my understanding, when will the real host receive profiles and apps in this case. Is it right after the IT admin assigns the host to ABM? Or is it after orbit is installed?

When the IT admin assigns a host to ABM, the fleet server assigns an enrollment profile to the device. At this point, no matter what, the IT admin won't be able to send profiles and apps to the device until the device enrolls in MDM.

This issue is more about how we "merge" information about a host, without doing this we won't be able to "match" our hosts records properly and we will end up with a "ghost" record that will be displayed in pages like /hosts/manage

mna commented 1 year ago

@noahtalerman @roperzh My understanding following the last exchange (last 2 comments) is that we will punt on this issue for now.

noahtalerman commented 1 year ago

punt on this issue for now

@mna right. I moved this issue to the product backlog (from the MDM board).

@lukeheath this is a good issue to bring in to a sprint if we have extra capacity.

lukeheath commented 1 year ago

@noahtalerman Because we have the capacity this sprint, I'd like to keep it in. In general, we should avoid bringing things in/out of the sprint unless there is a constraint that requires it.

mna commented 1 year ago

@roperzh The uuid thing is a bit confusing, so I just want to make sure I understand correctly the current behaviour first:

So the issue is that if the host was already orbit-enrolled, it has an osquery_host_id (the hardware_uuid), but that doesn't match the ABM-provided UDID, so we should provide the serial_number of the device instead, which would match with the ABM-provided serial.

To do that, we presumably have to run an osquery query on the host to get the serial number when enrolling from orbit, is that correct? A bit like orbit already runs an osquery query to get the hardware uuid?

roperzh commented 1 year ago

@mna the UDID we get from Authenticate is effectively the same value as hardware_uuid! not an apple-specific ID, which is good!

For DEP, the process consists of:

  1. The IT admin assigns the host to the MDM server in ABM
  2. We poll this API for new devices assigned to ABM, here we only get the serial_number of the device, we don't get the UUID (or UDID).
  3. We assign a profile to the device using the DEP API
  4. The next time the device tries to enroll, Apple delivers the enrollment profile we have assigned in 3.

The problematic scenario happens if things happen in this order:

  1. the host is assigned in ABM
  2. the host installs orbit
  3. the host hasn't completed MDM enrollment (this could only happen during a migration from another MDM provider)

Because in step 2, we won't have a hardware_uuid value to match so we will create two records in the hosts table. The proposed solution is to make orbit also send the serial_number on enrollment.

It's a bit messy and I'm not doing a great job explaining it, let me know if it makes more sense or if you have more questions.

mna commented 1 year ago

@roperzh Thanks for the clarification, this part is what I didn't get at first:

We poll this API for new devices assigned to ABM, here we only get the serial_number of the device, we don't get the UUID (or UDID).

I think that should clarify the ticket for me, thanks!

mna commented 1 year ago

@roperzh Hmm there's still something I don't quite grasp. You say that we poll the https://mdmenrollment.apple.com/devices/sync Apple DEP endpoint, and that's where we have an issue because it only receives the serial number, but I don't see where we do this polling in the codebase (I checked for the /devices/sync path and also for the equivalent godep.SyncDevices call). Is that something that should already be there or is it to prevent an issue in the future when we add the polling to sync devices?

That's partly what got me confused initially, because the only place I can see where we insert only partial information in the hosts table is when we get the checkin/Authenticate MDM message (https://github.com/fleetdm/fleet/blob/main/server/service/apple_mdm.go#L1126-L1144), but we look for a host based on both UUID (with the provided UDID) and hardware serial.

In other words, I think it's fairly easy to address the first task you listed in the ticket (assuming I can figure out the osquery to get the serial no on the device, seems like it's simply select hardware_serial from system_info):

Update Orbit to send the serial number of the host along with the UUID on enrollment.

But I can't see where the second task needs to happen (i.e. from what you mention it should be where we call the SyncDevices Apple endpoint, but AFAICT we don't call it yet):

Update the server API to make use of the serial number in addition to the UUID to decide if a host should be updated or added in the hosts table.

roperzh commented 1 year ago

@mna ah! sorry for not being clear on that, it happens under a layer of abstraction, here:

https://github.com/fleetdm/fleet/blob/72c91744feca1303b2eb6f5eb9f912d9c51b92d5/cmd/fleet/cron.go#L833-L906

But I can't see where the second task needs to happen (i.e. from what you mention it should be where we call the SyncDevices Apple endpoint, but AFAICT we don't call it yet)

I think it should happen here:

https://github.com/fleetdm/fleet/blob/72c91744feca1303b2eb6f5eb9f912d9c51b92d5/server/datastore/mysql/hosts.go#L941

because that's where we do the "matching", but that's just a thought!

In other words, I think it's fairly easy to address the first task you listed in the ticket (assuming I can figure out the osquery to get the serial no on the device)

For the hardware uuid, we shell out to osquery, I think we could leverage that:

https://github.com/fleetdm/fleet/blob/72c91744feca1303b2eb6f5eb9f912d9c51b92d5/orbit/cmd/orbit/orbit.go#L383-L388

mna commented 1 year ago

@roperzh Awesome, I missed that, thanks!

noahtalerman commented 1 year ago

@lukeheath during today's standup, we discussed that this issue doesn't need to be completed to call the "Migrate between MDM solutions" done.

mna commented 1 year ago

I took a deep dive into our various "device enrollment" handlers, this is what I've found (suggestion in a subsequent comment), comments/clarifications welcome if I'm missing or misunderstanding anything.

@roperzh @lukeheath @noahtalerman

Method Code Looks for existing host with Creates host with
Orbit enrollment service.EnrollOrbit in server/service/orbit.go, Datastore.EnrollOrbit in server/datastore/mysql/hosts.go osquery_host_id = system_info.uuid or hardware_serial = system_info.hardware_serial, serial lookup was added as part of this ticket osquery_host_id = uuid, hardware_serial = serial, uuid = ''
Osquery (agent) enrollment service.EnrollAgent in server/service/osquery.go, Datastore.EnrollHost in server/datastore/mysql/hosts.go osquery_host_id = <provided osquery identifier> which may or may not be the UUID, it is controlled by the --osquery_host_identifier Fleet config flag, see also https://github.com/fleetdm/fleet/issues/9033 osquery_host_id = the osquery identifier, hardware_serial and uuid left empty [^1]
Manual MDM enrollment Datastore.IngestMDMAppleDeviceFromCheckin in server/datastore/mysql/apple_mdm.go, called by Apple via our registered MDM endpoint (/mdm/apple/mdm) uuid = UDID or hardware_serial = SerialNumber hardware_serial = SerialNumber, uuid = UDID, osquery_host_id = UDID
Automatic (DEP) MDM enrollment Datastore.IngestMDMAppleDevicesFromDEPSync in server/datastore/mysql/apple_mdm.go called via a cron job hardware_serial = SerialNumber (the serial number is the only identifying information we receive in this scenario) Creates missing hosts by matching on hardware_serial and creating those that don't match, hardware_serial = SerialNumber, osquery_host_id = NULL, uuid = ''

We are confident that the Apple's MDM UDID field is the same value as the Osquery system_info.uuid field. As much as possible, we shouldn't make any assumptions about ordering of those various enrollments, and typically an MDM-enrolled host will execute 3 of those enrollments (the MDM one - one of automatic or manual - the Orbit one as we require Orbit to be used for MDM, and the Osquery one).

[^1]: As part of the osquery enrollment, osquery (may?) also provides the content of the system_info table, which contains the uuid and the hardware_serial, and which are saved in the host table after its creation. So assuming this is reliably provided as part of the osquery enrollment, the host will have the uuid and hardware_serial fields field as well as the osquery_node_id, except that the osquery_node_id might not be the uuid.

mna commented 1 year ago

@roperzh @lukeheath @noahtalerman

What I think we can do based on this:

Standardize how we identify existing hosts in all 4 enrollments:

Standardize how we create the host when no match was found:

Standardize how we update the host if match is found:

Update how osquery enrollment updates the host if found:

mna commented 1 year ago

I believe the only way this could fail to work (and generate ghost hosts) is one of those conditions:

I don't think we need to worry about the second one, and the first one is probably unlikely? Based on https://github.com/fleetdm/fleet/issues/9033#issuecomment-1411150758, it looks like we would recommend not changing the osquery identifier to something other than the UUID. This would leave the case where DEP enrollment creates the host with just the serial number (as it's all that's available) and osquery enrollment follows without the hardware_serial for some reason. I don't really see what we can do about it, but we can certainly add a log when that serial is missing from osquery enrollment, which could help identify why there are ghost hosts.

roperzh commented 1 year ago

@mna that is such a great summary that makes me wonder if we could copy/paste it somewhere as internal documentation, thanks <3!!

your plan makes sense to me, it's a much needed cleanup. I think my only question is where it stands in the priority list in regards to all the other stuff we need to do. cc: @lukeheath

mna commented 1 year ago

@noahtalerman @roperzh @lukeheath regarding what happens if we don't prioritize this bugfix and "ghost hosts" get created. Let's assume the following scenario where a user inadvertently mixes the order of actions that we recommend and starts by assigning the host to Fleet in ABM, and then proceeds with installing orbit on the host.

  1. ABM enrollment creates a host record with basically just the serial number (that's the "Automatic (DEP) MDM enrollment" in the table above)
  2. "Orbit enrollment" happens later on and creates another host record, because it cannot match the device's UUID to an existing host (it doesn't provide the serial number during the enrollment). The "orbit_node_key" gets associated with that host entry, so whenever Orbit fetches its configuration and other notifications - such as the "renew enrollment profile" - that's the host that would get returned.
  3. "Osquery (agent) enrollment" then happens, IF (and only IF) the osquery host identifier is the uuid, it finds the same host created in 2) and reuses the same host record as for orbit enrollment (otherwise it creates yet another distinct host). The osquery node_key gets associated with that host entry, and whenever the device's osquery pings fleet, it uses that host entry. As part of this enrollment, the host's serial number should also get saved.

At this point, we have at least 2 host entries (the ABM one + the Orbit one), possibly 3 if Fleet is configured with a different osquery host identifier than uuid. If the organization settings enabled the host expiry configuration (https://fleetdm.com/docs/using-fleet/configuration-files#host-expiry-settings), then after the expiry window, only the Orbit/Osquery-created host will remain.

I think the profiles and MDM commands would still run properly on the devices (hard to say for sure as those features are not yet implemented, and it depends on how they are), but whether everything works smoothly will depend in big part on how we store those command results and associate them with hosts (we'll have to make sure we select on more than just the serial number, otherwise we could pick the "ghost" one).

mna commented 1 year ago

Changes have been pushed to the PR (https://github.com/fleetdm/fleet/pull/9612) but needs more testing in load testing environment, to compare before/after for effects on enrollment time and DB causing context cancellations, etc. See https://fleetdm.slack.com/archives/C03C41L5YEL/p1675797614564469 . Pausing for now.

mna commented 1 year ago

@xpkoala @lukeheath Some context for QA.

This is not a visual change nor something that can be easily tested, the plan would be to run some testing with orbit-enrolled and non-orbit-enrolled setups (that is, a deployment without orbit) would be ideal - without Fleet MDM enabled -, and keeping an eye for any ghost hosts/non-enrolling hosts and things like that.

And then with MDM enabled (which is just possible with orbit), looking for the same kind of issues (missing/ghost hosts, and a combination of manually enrolled and DEP enrolled). But this case will likely be covered in part by the dogfood deployment and running during the sprint.

Don't hesitate to reach out to me if anything isn't clear or if you see something weird going on during those tests!

fleet-release commented 1 year ago

Secure data, ease of use Fleet's Orbit now sends serial, Streamlined enrollment.