fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.18k stars 435 forks source link

Policy automations: install software #19551

Closed nonpunctual closed 2 months ago

nonpunctual commented 6 months ago

Goal

User story
As an IT admin,
I want to install software automatically when a host fails a policy
so that I can deploy software to many hosts without having to use 3rd party automation tool (e.g. Tines).

Context

Changes

Product

Engineering

ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

QA

@noahtalerman:

Load test

The osquery-perf agents are able to simulate software installation. They have a 5% fail rate by default. See cmd/osquery-perf/README.md how to adjust pre/install/post fail probabilities. Once installed, the software will show up on the host with the next refetch.

Given a ~100 MB install package, try to automatically install software on 100,000 hosts.

Demo

https://www.loom.com/share/38e17e4ab76b40e6a8cd6515b5f1e015

valentinpezon-primo commented 5 months ago

Hi @noahtalerman @nonpunctual ,

No much to add to this one, for context :

Since we are the source of truth for labels, A workaround could be to use your API to do software installs based on our internal labels

noahtalerman commented 5 months ago

Contributes to parity with Jamf

noahtalerman commented 5 months ago

Thanks @valentinpezon-primo for the info!

marko-lisica commented 4 months ago

Converting this issue to story format and moving original description here:

Organizations may have the need to install applications based on:

i.e., a grouping of Hosts or end users that does not align to a Team in Fleet.

Scenario:

If we do this, the only options for application install in the case where a customer does not use Teams would be:

Problem

Potential solutions

  1. Allow applications to be assigned to Hosts that match a Label.
noahtalerman commented 4 months ago

For checking a host's version using osquery queries:

noahtalerman commented 4 months ago

If you find some time you can record feedback on the UI that we didn't look at (labels badge on software details and advanced options modal). I would like to hear more, why do you think we want to split pending status to pending to install and pending "verification".

I believe it would be a better experience if we could manage to refetch host info if online and know right away if the host has software + if we can update software inventory together with host refetch, so counts are matching.

Hey @marko-lisica, I recorded a Loom video w/ feedback and thoughts on the above here (internal).

marko-lisica commented 4 months ago

TODO @noahtalerman : Merge in software API changes when we ship this story so that API is up to date, then add install, labels_exclude_any and labels_include_any to POST /api/v1/fleet/software/batch and POST /api/v1/fleet/spec/teams

noahtalerman commented 4 months ago

TODO @noahtalerman:

cc @marko-lisica

RachelElysia commented 4 months ago

Consensus from/with @jacobshandling during our estimation meeting:

FE: ~21 total = 1-2 (Free) + 8 (Add software) + 5 (Software details) + 2-3 (Options modal) + 2 (Host software)

sharon-fdm commented 4 months ago

@RachelElysia, @jacobshandling, I updated the BE sub-tasks according what we discussed. TODO: agree between you on a proper division of #20897 into (at least) two sub-tasks to be developed by both of you.

lukeheath commented 4 months ago

Maturity review notes:

Brock: Other products allow me to apply multiple layers of filters. So I can start with a group that is Mac Sonoma devices only. Then within that group, I can create additional subsets, like arm64, for example.

Noah: To do that with this feature, you'd have to create an individual label for every combination you want.

Brock: The way this feature is typically used is to have many layers of subsets, and while creating individual labels is maybe possible, it would be painful and complicated to implement.

Noah: It sounds like we'll need to revisit in a future iteration to think about layers of filters.

Noah: Two things we’re missing:

noahtalerman commented 3 months ago

FYI @marko-lisica and @getvictor, I met w/ @lucasmrod and @gillespi314 during design review and we made several decisions re the software verification loop:

  1. Wait for the next refetch when an app goes to "pending" in this iteration (instead of triggering a refetch).
    • We don't have this functionality to refetch a set of hosts elsewhere in the app (yet)
    • The plan is to learn if this makes the install flow too slow. If it does then we can ship an improvement later.
    • In this iteration, the IT admin can hit "Refetch" on the Host details page (and in the API) to speed up install for a specific host.
    • Sarah: When we decide to make an improvement, instead of triggering a refetch, we could update the pre-install condition to include a check for the presence of software. This would also speed things up.

Screenshot 2024-08-07 at 9 48 53 AM

  1. Add logic to retry installing the software once if it's in "verifying" state and then we find that software is missing or an older version is installed. If we've retried already, we'll move the software to failed with a different error message.

Screenshot 2024-08-07 at 9 52 48 AM

Screenshot 2024-08-07 at 10 21 05 AM

noahtalerman commented 3 months ago

Hey @xpkoala heads up that I added this note to the QA section to make sure we're testing it as part of this story.

Also, more generally, do we fill out the QA section in stories? I've noticed the template is usually left alone. I just removed the template in this story.

If an App Store app is installed and then later uninstalled on an iOS/iPadOS host, check to make sure it doesn't show up on that Host's host details page anymore and the software counts are updated accordingly.

getvictor commented 3 months ago

@noahtalerman What about VPP licenses? If a user deletes a VPP app or wipes the host, should we release the license?

noahtalerman commented 3 months ago

What about VPP licenses? If a user deletes a VPP app or wipes the host, should we release the license?

@getvictor thanks for being loud about this. To stay focused and get #19551 shipped, I think we can follow up and add this next sprint. Here's the request for it:

Thoughts?

I'm assuming it's not harder to add this later v. now. And that adding it now will add a significant amount of work. I think I would rather get to some bugs.

noahtalerman commented 3 months ago

How/where Fleet extracts name and version from packages. This way, if the IT admin hits this error they can understand why Fleet can't get the version and know how to fix their package.

Hey @sharon-fdm I think we want to include this as part of the guide for this feature.

Please feel free to schedule some time w/ me if you want a 5 minute run down!

lucasmrod commented 3 months ago

@noahtalerman

Screenshot 2024-08-14 at 10 15 09 AM

Small thing. In the /software/titles/:id we should show the install type somewhere. Otherwise you don't know which software is automatic/manual after you create it. (Unless am missing something.)

noahtalerman commented 3 months ago

Hey @lucasmrod, good call.

The plan is to show the install type in the Options modal. User gets here by clicking Actions > Show options on the Software title page:

Screenshot 2024-08-14 at 11 27 13 AM

Screenshot is from Figma here.

lucasmrod commented 3 months ago

Ah I missed it. Thanks! LGTM!

marko-lisica commented 3 months ago

How/where Fleet extracts name and version from packages. This way, if the IT admin hits this error they can understand why Fleet can't get the version and know how to fix their package.

Hey @sharon-fdm I think we want to include this as part of the guide for this feature.

Please feel free to schedule some time w/ me if you want a 5 minute run down!

@noahtalerman @sharon-fdm We have different extraction methods for each type of package. I think we should cover all of them (.pkg, .msi, .exe, .deb).

noahtalerman commented 3 months ago

Hey @marko-lisica and @sharon-fdm heads up that I moved this issue from the release board to the drafting board because while it's in expedited drafting.

I assigned Marko.

noahtalerman commented 3 months ago

For visibility, I'm pulling this out of Slack:

We decided to a change mid-sprint to simplify this app management feature.

The old plan was for the IT admin to choose "Automatic" install for an app. Fleet would detect, under-the-hood if the software is already installed.

Old wireframes are here: Screenshot 2024-08-17 at 12 42 28 PM

The new plan is to change the trigger for app install: policy failure.

New wireframes (still in progress):

Screenshot 2024-08-17 at 12 45 03 PM

This means that customers/users will now be “in the loop.” They have full control over when an app is installed w/ policies (no blackbox). And, this sets us up to keep this control when add UX improvements ("Automatic" abstraction on top of policies) later.

cc @dherder @alexmitchelliii

lukeheath commented 3 months ago

@noahtalerman On the design review today @spokanemac brought up concerns about the viability of this feature for managing a large library of software (more than 100). It's important that we distinguish that this approach is viable for a handful of "hero" software items, but not viable for updating all software across all hosts as part of vulnerability management. If a Fleet admin wanted to update all software across all hosts, they would still need to use something like Munki.

I suggested to @marko-lisica that we show these designs to IT customers to get their feedback on the viability for their use cases, as well as looping in @nonpunctual to get his feedback as he was the original submitter of this story, though it's changed quite a bit since it was first created.

lukeheath commented 3 months ago

@marko-lisica I'll continue to dig in this week, but as I've compared this feature to existing features in other MDMs I think it's a good first step. Tying to policies for initial installation makes sense, and is similar to other MDMs use of "manual" software updates.

@spokanemac It's true that this feature doesn't provide automatic updates, which would make this arduous to use at scale, but the Fleet app library introduces the concept of automatic updates and that will be following on later this quarter.

ddribeiro commented 3 months ago

@lukeheath I agree with @spokanemac's assessment that designing this feature to install software to be based on policy failure becomes increasingly difficult to manage as a customer's software library grows.

I think customers who are asking for software to be installed automatically based on team membership expect it to work like custom settings/profiles do today.

Add host to team -> Fleet automatically installs software associated with that host -> host is moved to a different team -> Fleet removes software associated with the previous team and installs software associated with the new team

Can the desire to have more control over when software gets installed be solved with the existing pre-install condition feature? I'd like to learn more about what use cases using a policy as the trigger for installation solves.

customer-easterwood says:

• Would there be a possibility of auto installing all apps assigned to a specific team? • Similarly how we have it configured in Jmpcloud, we basically assign a set of apps to a "device group" and any devices associated with that group would have the apps installed automatically, would be nice to have.

I think the CS team should have more customer conversations to understand how they'd want this feature to work.

cc: @nonpunctual

nonpunctual commented 3 months ago

So @lukeheath & I discussed this & this feature is supposed to be followed closely by https://github.com/fleetdm/fleet/issues/18865 - App Library

@spokanemac @ddribeiro I think this model gets close to Jamf + Munki / Jamf App Installers, i.e.,

lukeheath commented 3 months ago

Yep! Fleet app library with automatic install and update is coming into the next sprint and is a Q3 deliverable.

lukeheath commented 3 months ago

On design review last Friday we made the following decisions:

  1. We are creating a new “no team” yaml file for GitOps.

    • This is where we will configure “no team” policies and software.
    • You can also include controls here optionally, or in default.yml, but not both.
  2. We are going to remove the software property from default.yml, which is a breaking change.

    • We need to contact CS and make sure any customers that have adopted this feature are aware and can adjust (or we do it for them).
marko-lisica commented 3 months ago

cc @noahtalerman ^

lukeheath commented 3 months ago

I confirmed the "Application deployment" item in pricing-features-table.yml is listed as isExperimental. Do we need to update any reference docs to make sure it's clear this is an experimental feature?

noahtalerman commented 3 months ago

@lukeheath experimental call out in the GitOps reference here.

Opened a PR to API reference here. It looks like we were missing several endpoints used in app deployment features.

noahtalerman commented 3 months ago

We are going to remove the software property from default.yml, which is a breaking change.

You can also include controls here [no-team.yml] optionally, or in default.yml, but not both.

@lukeheath I'm assuming we decided to break software and not break controls because software is experimental and controls. Is that right?

If yes, then I think it makes sense to update the reference docs to only the new way:

What do you think?

cc @marko-lisica

lukeheath commented 3 months ago

@noahtalerman Yes, we decided to leave support for controls in default.yml because it is no longer experimental. We discussed making it optional for backwards compatibility, but moving forward make the best practice to define the "No Teams" controls in the no-team.yml file. That way, default.yml is reserved for only "All Teams" by both free and premium. For premium users with teams support, they would have a separate YML for each team, and no team.

If you feel this is the wrong approach please let me know ASAP so we can discuss. Thanks!

noahtalerman commented 3 months ago

We discussed making it optional for backwards compatibility, but moving forward make the best practice to define the "No Teams" controls in the no-team.yml file. That way, default.yml is reserved for only "All Teams" by both free and premium.

@lukeheath thanks! This approach makes sense.

Knowing this, I think I would change to my proposal here: Update docs to say the controls under default.yml is for Fleet Free only. Once you upgrade to Fleet Premium the best practice is to move "No team" controls from default.yml to no-team.yml.

Related question: What happens if a Fleet Premium user specifies controls in no-team.yml and in default.yml?

Let's say they're upgrading from Fleet Free to Fleet Premium.

Apologies if you're repeating yourself. Please feel free to tell me to go look at design review in Gong!

marko-lisica commented 3 months ago

Related question: What happens if a Fleet Premium user specifies controls in no-team.yml and in default.yml?

@noahtalerman It's already specified here in Figma.

Screenshot 2024-08-28 at 14 13 15
noahtalerman commented 3 months ago

UPDATE: Documented only path (separate file) for software. PR is here.

We decided that the separate file is the best practice b/c it's the format that can be used w/ policy automations. Inline can't be used w/ policy automations. And, we want to avoid documenting two ways of using the feature.

(noahtalerman 2024-10-29)


TODO @noahtalerman: In reference docs document both inline and path for software but call out in inline docs that this format can’t be used with policy automations.

noahtalerman commented 3 months ago

I think my comments here in GitHub introduced confusion over whether the "No team" policies part of the feature is ready for dev or not.

To over communicate, I'm aligned on the planned solution. Unless I'm missing something, I think there's nothing blocking engineering.

cc @lucasmrod @jacobshandling @lukeheath @marko-lisica

pintomi1989 commented 3 months ago

Hey @noahtalerman,

Does this still include the original ask to scope by labels? See: https://github.com/fleetdm/fleet/issues/19551#issuecomment-2230207330

noahtalerman commented 3 months ago

@pintomi1989 this iteration does not.

That said, this feature makes it so policy failures trigger software installs.

A policy can be any osquery query. So, the IT admin can craft a policy so that it only returns results on a subset of hosts (similar to scoping w/ labels which also use an osquery query).

In future iterations we'll make it easier for users to scope so that they don't have to include this in their policy's query.

cc @nonpunctual

nonpunctual commented 3 months ago

@noahtalerman @pintomi1989 dynamic labels are also based on queries so I guess I am unclear on why this can be tied to policies but not labels. Thanks.

noahtalerman commented 3 months ago

dynamic labels are also based on queries so I guess I am unclear on why this can be tied to policies but not labels. Thanks.

Hey @nonpunctual please take a look at the UI changes in the Figma here! https://www.figma.com/design/4pfUOYy7IyMIrjMH2fuCdU/%2319551-Automatically-install-software-and-scope-software-with-labels?node-id=0-1&t=kj6YhMn6ewXjY9vr-1

nonpunctual commented 3 months ago

Re-created original FR: https://github.com/fleetdm/fleet/issues/21825 @pintomi1989 @zayhanlon

valentinpezon-primo commented 3 months ago

original

Thanks @nonpunctual !

lucasmrod commented 2 months ago

@marko-lisica @noahtalerman

Regarding "No team" policies support.

We'll need some API changes (on PATCH /api/latest/fleet/config) to apply "Calendar events" and "Other workflows" settings for "No team" policies. Screenshot 2024-09-09 at 1 55 19 PM

Or do we want to leave this for a later iteration? If so, maybe we can do the same "gray out" when selecting "All teams" policies: Screenshot 2024-09-09 at 1 56 18 PM Am all ears.

noahtalerman commented 2 months ago

Hey @lucasmrod good catch!

I think we can leave it for a later iteration. @lukeheath and @marko-lisica please let me know if I'm wrong.

I think greying out "Calendar events" and "Other workflows" is a good idea.

Also, I think let's add a tooltip on hover over those options w/ the following text:

"Please select a team first. This isn't currently support for No team."

I think we can borrow the styles from this tooltip for Fleet Free: Screenshot 2024-09-09 at 4 43 20 PM

Also, when using GitOps, what error message does the IT admin see when they try to add a policy to "No team" w/ calendar events enabled?

What about when they try to turn on other policy automations (webhook and tickets) for "No team" ? I think we want easy to understand error messages here.

valentinpezon-primo commented 2 months ago

Hi @noahtalerman @lucasmrod -

Just checking just to be sure but since we do not use teams and use the "No team" a lot, we would like to be able to use the install software policy for our devices that does not belongs to any team ! Would this be possible ?

noahtalerman commented 2 months ago

Just checking just to be sure but since we do not use teams and use the "No team" a lot, we would like to be able to use the install software policy for our devices that does not belongs to any team ! Would this be possible ?

Hey @marko-lisica and @valentinpezon-primo, just to clarify, the "Install software" policy automation (via the UI, API, and GitOps) will be supported for "No team" in 4.57.

The "Calendar events" and "Other workflows" won't be supported for "No team" in this iteration.

Screenshot 2024-09-10 at 2 27 08 PM

(T = teams and NT = no team)

lukeheath commented 2 months ago

@valentinpezon-primo @noahtalerman Yes, we have this story to provide policies to "No teams": https://github.com/fleetdm/fleet/issues/21790

Because this will apply, my understanding is it would support all policy automations, but @lucasmrod can confirm.

lucasmrod commented 2 months ago

Correct. On 4.57.0 we will be shipping support for policies for "No team" AND the capabilities to associate software packages to them too (IOW the "Policy automations: install software" feature will work for teams and for "No team").

sharon-fdm commented 2 months ago

QA DRI - @jacobshandling