fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.01k stars 418 forks source link

Create new API endpoint to provide aggregate status count of MDM profiles applying to hosts #9591

Closed lukeheath closed 1 year ago

lukeheath commented 1 year ago

Tasks

Example response

{
  "latest": 123,
  "pending": 123,
  "failing": 123
}

Figma

https://www.figma.com/file/hdALBDsrti77QuDNSzLdkx/%F0%9F%9A%A7-Fleet-EE-(dev-ready%2C-scratchpad)?node-id=10517%3A316027&t=gNbRhoIo5iC2OBaX-0

lukeheath commented 1 year ago

@noahtalerman If I request an aggregate count of host status applying profiles, and I do not provide a team_id param, should it return the aggregate results for all hosts or just hosts with no team?

noahtalerman commented 1 year ago

If I request an aggregate count of host status applying profiles, and I do not provide a team_id param, should it return the aggregate results for all hosts or just hosts with no team?

@lukeheath I think I would expect it to return just hosts with no team. On, the Hosts page, I would also expect to see just a list of hosts with no team.

lukeheath commented 1 year ago

Hey team! Please add your planning poker estimate with Zenhub @gillespi314 @mna @roperzh

roperzh commented 1 year ago

@lukeheath is this API endpoint for this page? image.png

mna commented 1 year ago

My understanding from earlier conversations we had about that is that this is aggregated on a cron schedule (e.g. every hour) a bit like what we already have for GenerateAggregatedMunkiAndMDM. The endpoint simply returns the existing aggregated stats, it does not compute them live on each request.

If so we already have the aggregated_stats table where it would be a new aggregated stats type, a new datastore method to generate those stats and a cron job to trigger it, and the endpoint GET /mdm/apple/profiles/summary to retrieve those stats.

lukeheath commented 1 year ago

@roperzh No, it will power this incoming UI component on the new "Controls" page:

image

@mna Correct - I expect we will leverage the existing hourly cron and related architecture that calculates aggregate stats.

gillespi314 commented 1 year ago

I think we could do these counts fairly cheaply on the fly using the indexes we have built into the host_mdm_apple_profiles table. Here's my initial stab at the query. (Input on simplifications or alternative approaches always welcomed!)

SELECT
    count(us.host_uuid) as count,
    us.status
FROM ( 
# aggregate mdm status for a host is "failing" if there is a failure installing/removing one or more profiles on the host
        SELECT
        host_uuid,
        status
    FROM
        host_mdm_apple_profiles
    WHERE
        status = 'failed'
# aggregate mdm status for a host is "pending" if there is one or more profiles pending (but none failed) on the host 
    UNION
    SELECT
        host_uuid,
        status
    FROM
        host_mdm_apple_profiles
    WHERE
        status = 'pending'
        AND host_uuid NOT in(
            SELECT
                host_uuid FROM host_mdm_apple_profiles
            WHERE
                status = 'failed')
# aggregate mdm status for a host is "latest" if one or more profiles is applied and there are no profiles pending or failed on the host. 
    UNION
    SELECT
        host_uuid, status
    FROM
        host_mdm_apple_profiles
    WHERE
        status = 'applied'
        AND host_uuid NOT in(
            SELECT
                host_uuid FROM host_mdm_apple_profiles
            WHERE
                status = 'failed'
                OR status = 'pending')) us
GROUP BY
    status
mna commented 1 year ago

@gillespi314 I'm slightly worried about computing this live in the sense that we did feel that we needed hourly aggregated stats for a number of other host-related statistics in the past. I'm not sure if that was just a precaution or if having them on-the-fly caused issues at scale. That being said, given that those stats would be computed just for an mdm-specific new endpoint, that helps limit the impact.

Regarding the query, I think it'd be faster if the sub-queries were joined to the specific host_uuid it is being compared to, instead of loading all failed or (failed or pending) regardless of host? E.g.:

SELECT
        host_uuid,
        status
    FROM
        host_mdm_apple_profiles hmap
    WHERE
        status = 'pending'
        AND host_uuid NOT IN (
            SELECT
                hmap2.host_uuid FROM host_mdm_apple_profiles hmap2
            WHERE
                                hmap.host_uuid = hmap2.host_uuid AND
                hmap2.status = 'failed')

And similarly for the last union query. Another option would be a WHERE NOT EXISTS (SELECT 1 ...) with the same sub-query, I'm not sure which is faster, might be worth trying out with ~10K hosts and see how it goes.

jacobshandling commented 1 year ago

@gillespi314 it seams that the API is returning {"latest": ..., "failed": ..., "pending": ...}, in that order. If we want to keep the current spec, Can you please change:

  1. "failed" to "failing"
  2. the order to be latest, pending, failing

No. 1 is trivial (I can just change the UI to expect a different string if we think "failed" is better than "failing"), but 2. will be helpful for displaying the UI correctly.

jacobshandling commented 1 year ago

@gillespi314 it seams that the API is returning {"latest": ..., "failed": ..., "pending": ...}, in that order. If we want to keep the current spec, Can you please change:

1. "failed" to "failing"

2. the order to be latest, pending, failing

3. is trivial (I can just change the UI to expect a different string if we think "failed" is better than "failing"), but 2. will be helpful for displaying the UI correctly.

I'll try to do this myself in the meantime

gillespi314 commented 1 year ago

@jacobshandling, those are pretty easy changes.

I'm curious why the order of the object properties matters to the UI implementation. Are you trying to iterate over the object as if it were an ordered array?

gillespi314 commented 1 year ago

@lukeheath we have some drift between the terms we're using to describe profile statuses.

Are we intentionally using "failed" in the individual context and "failing" for the summary?

Are we intentionally using "applied" in the individual context and "latest" for the summary?

jacobshandling commented 1 year ago

I'm curious why the order of the object properties matters to the UI implementation. Are you trying to iterate over the object as if it were an ordered array?

Well, using Object.entries, but yes. The order of insertion of non-number-like keys is maintained (numbers are sorted ascending though) - https://stackoverflow.com/a/5525820

lukeheath commented 1 year ago

@gillespi314 No, that is a mistake. Let's standardize around the language in this ticket. Would you please update the relevant specs? Thanks for catching this.

{
  "latest": 123,
  "pending": 123,
  "failing": 123
}
gillespi314 commented 1 year ago

The order of insertion of non-number-like keys is maintained (numbers are sorted ascending though) - https://stackoverflow.com/a/5525820

@jacobshandling Gotcha. My preferred approach would be to configure the display order as a constant in the UI and then iterate over that to pull out the keys by name.

const DISPLAY_ORDER = ["latest", "pending", "failed"];

const orderedDisplayValues = DISPLAY_ORDER.map((key) => apiResponse[key]);

That way the UI is more loosely coupled and can be configured to a different order, etc.

Plus I think it is a bit safer overall to treat objects as unordered collections since the JS rules can be weird traps for the unwary.

P.S. I'll make the ordering change on the backend too.

fleet-release commented 1 year ago

Yvh

A better world awaits us Data to keep us secure A count of MDM profiles