Closed ksatter closed 10 months ago
@zayhanlon and @ksatter heads up, this story will be air guitar'd during the next design sprint.
Notes from internal call: https://docs.google.com/document/d/1o8446eiAk-z2Bm_GSz0mZdb7CPTF0eymv-CtIm_VVxU/edit#heading=h.8oggoi13xg7y
More info in customer thread in Slack (internal): https://fleetdm.slack.com/archives/C03AE5T2EQ0/p1698348948683799
@noahtalerman It could be possible to manage this through Agent options, but it would require changes to the enrollment process. I thought of two potential scenarios for that:
--host-identifier
flag to fleetctl package
command. Sets host identifier used by all components of fleetd (osquery and Orbit).
provided
, uuid
, hostname
, or instance
. Default is uuid
. (options match osquery_host_identifier
)--host-identifier
flag, it's options, and the default in the fleetd configuration options section.fleetd picks up whatever osquery has set as host-identifier
. Or when you specify host-identifier
is sets it for both fleetd and osquery.
Michael: This way, we could continue using the same workflow: give teams osquery flag file w/ host-identifier specified. If this won't work, fall back to option 1.
How? Fleetd starts osquery and runs this query to get host_identifier
:
select value, instance_id from osquery_flags JOIN osquery_info where name = 'host_identifier';
Noah: If we can, I think the flagfile, if present, is the source of truth for osquery flags for the customer. The osquery flagfile overrides all osquery flags set remotely (agent_options.command_line_flags
in Fleet YAML). We don't document this. Using Fleet YAML is best practice.
Noah: Can we make this work? Zach: Yes but using an osquery flagfile may be breaking the extensions loading because Fleet needs to be writing to a flagfile. Might be a reason to do option 1.
TODO Noah: Ask support if setting an osquery flagfile breaks managing extensions remotely.
Why? We're not certain that Orbit and osquery having the same host identifier will resolve the problem with receiving extensions.
Future problem:
As an Endpoint Engineer, I want to be able to specify --tls_client_cert
, --tls_client_key
, and --watchdog_**
flags via fleetctl package
command or Fleet YAML so that I don't have to use an osquery flagfile.
UPDATE: The Fleet YAML already supports --fleet-tls-client-certificate
, --fleet-certificate
, and --watchdog_**
flags.
The customer can use these instead of osquery flagfile.
The customer can set different values for these flags in each team in Fleet.
Are teams granular enough?
(2023-12-01)
Today, the customer is setting these flags in the osquery flagfile and updating them remotely via Chef.
My guess is that they can't use Fleet YAML to update these remotely because different hosts need different values for these flags. Targeting based on teams isn't sufficient (not granular enough) because the customer uses teams for a rollout use case (staging and production).
The likely solution to this problem is to allow different agent options based on label.
@lucasmrod TODOs are in this Google doc: https://docs.google.com/document/d/187PA5ctmIFjD8-HLkAEYXjf0OOL61bvGhNnu-2VZa24/edit
- [ ] fleetd changes:
- [ ] fleetd is forced to use the
--host_identifier
value set duringfleetctl package
no matter what value is set in an osquery flagfile.- [ ] fleetd is forced to use the
--extensions-autoload
value set byagent_options.extensions
in Fleet YAML no matter what value is set in an osquery flagfile.
@lucasmrod here's how I summarized the fleetd changes based on discussion in this Google doc.
What do you think?
fleetd is forced to use the --extensions-autoload value set by agent_options.extensions in Fleet YAML no matter what value is set in an osquery flagfile.
Should be something like:
fleetd is forced to use the
--extensions-autoload
value set by itself (orbit always sets this to/opt/orbit/extensions.autoload
) no matter what value is set in an osquery flagfile or Fleet YAML.
No migration needed for existing hosts. User will have to reinstall package to use this feature.
Once released users that are hitting these 2-hosts-as-1-bug will have to:
fleetctl
version (with fleetctl package --host_identifier=instance [...]
) and re-install such package on the hosts.Remove the hosts with the 2-hosts-as-1-bug from Fleet.
@lucasmrod the customer has already deleted the Orbit enrolled host record in Fleet.
If I'm understanding correctly, they will have to delete the osquery enrolled host too?
If they want to, can they delete the osquery enrolled host after they install the new fleetd w/ --host-identifier
flag? (step 2)
@noahtalerman Am changing the env var from HOST_IDENTIFIER
to ORBIT_HOST_IDENTIFIER
(all orbit variables have the ORBIT_
prefix).
fleetd is forced to use the --host_identifier value set during fleetctl package no matter what value is set in an osquery flagfile.
Am also editing --host_identifier
to --host-identifier
.
@xpkoala QA steps were added to the description.
Unified identifiers, No more duplicates in sight. Fleet's path becomes clear.
Goal
Changes
Product
--host-identifier
flag to fleetctl package command. Sets host identifier used by all components of fleetd (osquery and Orbit). Options areuuid
andinstance
. Default isuuid
.ORBIT_HOST_IDENTIFIER
. This way, the user can update this env variable via automation tool (ex. Chef) and force the host to reenroll w/o having to deploy a new package.--host-identifier
value set duringfleetctl package
no matter what value is set in an osquery flagfile.command_line_flags
inconfig
andteam
Fleet YAML so that Fleet returns an error if--host-identifier
or--extensions-autoload
are set.The [insert unsupported flag here] flag isn't supported. Please remove this flag.
--host-identifier
flag, it's options, and the default in the fleetd configuration options section.Context
uuid
as its host identifier. This isn't configurable.instance
. They're doing this using an osquery flagfile.QA
Risk assessment
Manual testing steps
main
if it was already merged)First you will need to create a new local TUF repository.
Testing scenarios for QA:
A. Test the new feature with two VMs that have the same hardware UUID and serial number. You can simulate this with the following steps:
--host-identifier=instance
, e.g. for macOS:instance
as its identifier instead of the hardware UUID/serial)B. Test generating packages without
--host-identifier=instance
(basically default behavior). Test both Fleet with MDM enabled and disabled.C. Test packages generated without
--host-identifier=instance
against the latest released Fleet (4.41.1). Hosts should enroll without issues.D. Test customers upgrading the flag on already installed fleetd instances. (Meaning they won't re-install the package and instead set the orbit flag manually or via config management like Chef.)
This should be tested on Linux and Windows. Not on macOS. Make sure to have Fleet Desktop disabled.
--host-identifier
with an old version of Orbit (e.g. latest released Orbit). You can do this by generating the TUF repository withmain.sh
onfleet-v4.41.1
.select id, osquery_host_id, hostname, uuid, hardware_serial from fleet.hosts;
osquery_host_id
should matchuuid
.GOOS=linux GOARCH=amd64 go build -o orbit-linux ./orbit/cmd/orbit && ./tools/tuf/test/push_target.sh linux orbit orbit-linux 43
). Orbit should auto-update.sudo systemctl stop orbit
)ORBIT_HOST_IDENTIFIER
set toinstance
(on Linux it's/etc/defaults/orbit
).sudo systemctl start orbit
).select id, osquery_host_id, hostname, uuid, hardware_serial from fleet.hosts;
osquery_host_id should match what osquery reports in instance_id when running the queryselect instance_id from osquery_info;
.For Windows:
Services
app to stop/start the fleetd serviceRegistry
app to add the--host-identifier=instance
option to orbit's invocation. (On my VM it's registry key:Computer\HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\Fleet osquery
)E. Test enrolling vanilla osquery against Fleet.
--
Other things to take a look at:
Testing notes
Confirmation