Closed roperzh closed 1 year ago
could happen when a client is migrating from their MDM solution and somehow installs orbit in the device before the device is properly enrolled into MDM.
@roperzh are there problems with this flow? Or, am I misinterpreting your message.
For manually enrolled hosts, adding hosts (installing fleetd) before turning on (enrolling) MDM in Fleet is what we're planning on documenting. The migration docs are here: https://docs.google.com/document/d/1Lzx35dYFWFPTUYWGWYXfvGCLHcy2QX1Uuy64izBPa78/edit
@noahtalerman this problem is for DEP enrolled hosts, but by looking at the docs: yes we'll need to fix this before the MVP release, otherwise we won't be able to match these hosts in the fleet server.
I updated the description to make this clear. Let me know if it makes sense to you.
Please add your planning poker estimate with Zenhub @roperzh
From this issue's description:
If a DEP device is being migrated from another MDM solution into Fleet's MDM we will recommend the user to:
- Assign the host to Fleet in ABM
- Install orbit in the host
Currently, the migrations docs (See Automatically enrolled (DEP) hosts section) say the opposite. Add host first and assign host in ABM second.
@roperzh will the documented^ steps not work?
@noahtalerman good catch, the documented steps will work whether we do this or not, the work described here is to account for the edge case of the IT admin assigning the host to ABM first.
Do you think we can punt on this?
Edit: I have updated the issue description accordingly
Do you think we can punt on this?
@roperzh hmm, I'm not sure yet. If we can't match the host, how does this impact the IT admin? Will this affect what profiles (and apps) are sent to the host?
@noahtalerman sorry for the delayed response here. If we're not able to match the host, the IT admin will see an extra "ghost" host in the UI that only has a serial number.
@roperzh no worries!
if the IT admins assigns the host to ABM first, we won't be able to match the host
In this case, will the "real" host still receive profiles and apps? How much more work will it be to take this on later?
If yes, real host will still receive profiles, and the amount of work later is the same, I think we can punt on this issue.
Just for my understanding, when will the real host receive profiles and apps in this case. Is it right after the IT admin assigns the host to ABM? Or is it after orbit is installed?
In this case, will the "real" host still receive profiles and apps?
yes, it would!
How much more work will it be to take this on later?
as far as I can tell, it won't make any difference, but take that with a grain of salt as things change rapidly
Just for my understanding, when will the real host receive profiles and apps in this case. Is it right after the IT admin assigns the host to ABM? Or is it after orbit is installed?
When the IT admin assigns a host to ABM, the fleet server assigns an enrollment profile to the device. At this point, no matter what, the IT admin won't be able to send profiles and apps to the device until the device enrolls in MDM.
This issue is more about how we "merge" information about a host, without doing this we won't be able to "match" our hosts records properly and we will end up with a "ghost" record that will be displayed in pages like /hosts/manage
@noahtalerman @roperzh My understanding following the last exchange (last 2 comments) is that we will punt on this issue for now.
punt on this issue for now
@mna right. I moved this issue to the product backlog (from the MDM board).
@lukeheath this is a good issue to bring in to a sprint if we have extra capacity.
@noahtalerman Because we have the capacity this sprint, I'd like to keep it in. In general, we should avoid bringing things in/out of the sprint unless there is a constraint that requires it.
@roperzh The uuid
thing is a bit confusing, so I just want to make sure I understand correctly the current behaviour first:
hardware_uuid
, which is stored in hosts.osquery_host_id
when a new host is orbit-enrolled (it is not stored in hosts.uuid
).Authenticate
MDM message type) is received following enrollment of the host via ABM, we receive what is called a UDID
(https://en.wikipedia.org/wiki/UDID), which is apparently a Apple-specific unique device ID. We store that value in hosts.uuid
when the host row is created from ABM check-in.uuid
is null? I'm not sure what we used that column for previously, it's only set on UpdateHost
apparently. That's probably not relevant in our scenario here.So the issue is that if the host was already orbit-enrolled, it has an osquery_host_id
(the hardware_uuid
), but that doesn't match the ABM-provided UDID
, so we should provide the serial_number
of the device instead, which would match with the ABM-provided serial.
To do that, we presumably have to run an osquery query on the host to get the serial number when enrolling from orbit, is that correct? A bit like orbit already runs an osquery query to get the hardware uuid?
@mna the UDID
we get from Authenticate
is effectively the same value as hardware_uuid
! not an apple-specific ID, which is good!
For DEP, the process consists of:
serial_number
of the device, we don't get the UUID (or UDID).3
.The problematic scenario happens if things happen in this order:
Because in step 2
, we won't have a hardware_uuid
value to match so we will create two records in the hosts
table. The proposed solution is to make orbit also send the serial_number
on enrollment.
It's a bit messy and I'm not doing a great job explaining it, let me know if it makes more sense or if you have more questions.
@roperzh Thanks for the clarification, this part is what I didn't get at first:
We poll this API for new devices assigned to ABM, here we only get the serial_number of the device, we don't get the UUID (or UDID).
I think that should clarify the ticket for me, thanks!
@roperzh Hmm there's still something I don't quite grasp. You say that we poll the https://mdmenrollment.apple.com/devices/sync
Apple DEP endpoint, and that's where we have an issue because it only receives the serial number, but I don't see where we do this polling in the codebase (I checked for the /devices/sync
path and also for the equivalent godep.SyncDevices
call). Is that something that should already be there or is it to prevent an issue in the future when we add the polling to sync devices?
That's partly what got me confused initially, because the only place I can see where we insert only partial information in the hosts
table is when we get the checkin/Authenticate MDM message (https://github.com/fleetdm/fleet/blob/main/server/service/apple_mdm.go#L1126-L1144), but we look for a host based on both UUID (with the provided UDID) and hardware serial.
In other words, I think it's fairly easy to address the first task you listed in the ticket (assuming I can figure out the osquery to get the serial no on the device, seems like it's simply select hardware_serial from system_info
):
Update Orbit to send the serial number of the host along with the UUID on enrollment.
But I can't see where the second task needs to happen (i.e. from what you mention it should be where we call the SyncDevices
Apple endpoint, but AFAICT we don't call it yet):
Update the server API to make use of the serial number in addition to the UUID to decide if a host should be updated or added in the hosts table.
@mna ah! sorry for not being clear on that, it happens under a layer of abstraction, here:
But I can't see where the second task needs to happen (i.e. from what you mention it should be where we call the SyncDevices Apple endpoint, but AFAICT we don't call it yet)
I think it should happen here:
because that's where we do the "matching", but that's just a thought!
In other words, I think it's fairly easy to address the first task you listed in the ticket (assuming I can figure out the osquery to get the serial no on the device)
For the hardware uuid, we shell out to osquery, I think we could leverage that:
@roperzh Awesome, I missed that, thanks!
@lukeheath during today's standup, we discussed that this issue doesn't need to be completed to call the "Migrate between MDM solutions" done.
I took a deep dive into our various "device enrollment" handlers, this is what I've found (suggestion in a subsequent comment), comments/clarifications welcome if I'm missing or misunderstanding anything.
@roperzh @lukeheath @noahtalerman
Method | Code | Looks for existing host with | Creates host with |
---|---|---|---|
Orbit enrollment | service.EnrollOrbit in server/service/orbit.go , Datastore.EnrollOrbit in server/datastore/mysql/hosts.go |
osquery_host_id = system_info.uuid or hardware_serial = system_info.hardware_serial , serial lookup was added as part of this ticket |
osquery_host_id = uuid , hardware_serial = serial , uuid = '' |
Osquery (agent) enrollment | service.EnrollAgent in server/service/osquery.go , Datastore.EnrollHost in server/datastore/mysql/hosts.go |
osquery_host_id = <provided osquery identifier> which may or may not be the UUID, it is controlled by the --osquery_host_identifier Fleet config flag, see also https://github.com/fleetdm/fleet/issues/9033 |
osquery_host_id = the osquery identifier , hardware_serial and uuid left empty [^1] |
Manual MDM enrollment | Datastore.IngestMDMAppleDeviceFromCheckin in server/datastore/mysql/apple_mdm.go , called by Apple via our registered MDM endpoint (/mdm/apple/mdm ) |
uuid = UDID or hardware_serial = SerialNumber |
hardware_serial = SerialNumber , uuid = UDID , osquery_host_id = UDID |
Automatic (DEP) MDM enrollment | Datastore.IngestMDMAppleDevicesFromDEPSync in server/datastore/mysql/apple_mdm.go called via a cron job |
hardware_serial = SerialNumber (the serial number is the only identifying information we receive in this scenario) |
Creates missing hosts by matching on hardware_serial and creating those that don't match, hardware_serial = SerialNumber , osquery_host_id = NULL , uuid = '' |
We are confident that the Apple's MDM UDID
field is the same value as the Osquery system_info.uuid
field. As much as possible, we shouldn't make any assumptions about ordering of those various enrollments, and typically an MDM-enrolled host will execute 3 of those enrollments (the MDM one - one of automatic or manual - the Orbit one as we require Orbit to be used for MDM, and the Osquery one).
[^1]: As part of the osquery enrollment, osquery (may?) also provides the content of the system_info
table, which contains the uuid
and the hardware_serial
, and which are saved in the host table after its creation. So assuming this is reliably provided as part of the osquery enrollment, the host will have the uuid
and hardware_serial
fields field as well as the osquery_node_id
, except that the osquery_node_id
might not be the uuid
.
@roperzh @lukeheath @noahtalerman
What I think we can do based on this:
osquery_host_id
may differ from the host's uuid
Standardize how we identify existing hosts in all 4 enrollments:
osquery_host_id
match on the provided UUID/UDID
(or the osquery identifier for osquery enrollment) ORuuid
match on the provided UUID/UDID
ORhardware_serial
match on the provided SerialNumber
(each match only attempted if the value is actually provided/not empty). Modify the osquery enrollment to use the uuid
and hardware_serial
from the system_info
table provided with the enrollment, in addition to its current lookup by osquery_node_id
(which may or may not be the uuid).Standardize how we create the host when no match was found:
uuid
to the provided uuid/udidosquery_host_id
to the provided uuid/udid or to the provided osquery identifier (for osquery enrollment)hardware_serial
to the provided serial numberStandardize how we update the host if match is found:
uuid
to the provided uuid/udid if it was NULL/emptyhardware_serial
to the provided serial if it was emptyosquery_host_id
to the provided uuid/udid if it was NULL/emptyUpdate how osquery enrollment updates the host if found:
osquery_host_id
to set it to the osquery identifier (which may or may not be the uuid).I believe the only way this could fail to work (and generate ghost hosts) is one of those conditions:
uuid
or hardware_serial
as part of the system_info
table;I don't think we need to worry about the second one, and the first one is probably unlikely? Based on https://github.com/fleetdm/fleet/issues/9033#issuecomment-1411150758, it looks like we would recommend not changing the osquery identifier to something other than the UUID. This would leave the case where DEP enrollment creates the host with just the serial number (as it's all that's available) and osquery enrollment follows without the hardware_serial
for some reason. I don't really see what we can do about it, but we can certainly add a log when that serial is missing from osquery enrollment, which could help identify why there are ghost hosts.
@mna that is such a great summary that makes me wonder if we could copy/paste it somewhere as internal documentation, thanks <3!!
your plan makes sense to me, it's a much needed cleanup. I think my only question is where it stands in the priority list in regards to all the other stuff we need to do. cc: @lukeheath
@noahtalerman @roperzh @lukeheath regarding what happens if we don't prioritize this bugfix and "ghost hosts" get created. Let's assume the following scenario where a user inadvertently mixes the order of actions that we recommend and starts by assigning the host to Fleet in ABM, and then proceeds with installing orbit on the host.
node_key
gets associated with that host entry, and whenever the device's osquery pings fleet, it uses that host entry. As part of this enrollment, the host's serial number should also get saved.At this point, we have at least 2 host entries (the ABM one + the Orbit one), possibly 3 if Fleet is configured with a different osquery host identifier than uuid
. If the organization settings enabled the host expiry configuration (https://fleetdm.com/docs/using-fleet/configuration-files#host-expiry-settings), then after the expiry window, only the Orbit/Osquery-created host will remain.
I think the profiles and MDM commands would still run properly on the devices (hard to say for sure as those features are not yet implemented, and it depends on how they are), but whether everything works smoothly will depend in big part on how we store those command results and associate them with hosts (we'll have to make sure we select on more than just the serial number, otherwise we could pick the "ghost" one).
Changes have been pushed to the PR (https://github.com/fleetdm/fleet/pull/9612) but needs more testing in load testing environment, to compare before/after for effects on enrollment time and DB causing context cancellations, etc. See https://fleetdm.slack.com/archives/C03C41L5YEL/p1675797614564469 . Pausing for now.
@xpkoala @lukeheath Some context for QA.
This is not a visual change nor something that can be easily tested, the plan would be to run some testing with orbit-enrolled and non-orbit-enrolled setups (that is, a deployment without orbit) would be ideal - without Fleet MDM enabled -, and keeping an eye for any ghost hosts/non-enrolling hosts and things like that.
And then with MDM enabled (which is just possible with orbit), looking for the same kind of issues (missing/ghost hosts, and a combination of manually enrolled and DEP enrolled). But this case will likely be covered in part by the dogfood deployment and running during the sprint.
Don't hesitate to reach out to me if anything isn't clear or if you see something weird going on during those tests!
Secure data, ease of use Fleet's Orbit now sends serial, Streamlined enrollment.
Context, after https://github.com/fleetdm/fleet/pull/9065:
hosts
table that only has thehardware_serial
column populated (because that's the only info we have available.)If a DEP device is being migrated from another MDM solution into Fleet's MDM we will recommend the user to:
But, if the IT admins assigns the host to ABM first, we won't be able to match the host. To account for that case, we need orbit to send the device serial number along with the hardware uuid in
POST /api/fleet/orbit/enroll
so we can match the host in the database.Tasks
hosts
table.