fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.07k stars 426 forks source link

Improve loading states for deployments with 30k hosts #21345

Open roperzh opened 2 months ago

roperzh commented 2 months ago

Fleet version: 4.55.0


💥  Actual behavior

With a large amount of hosts, some queries take around 6 seconds to complete, the following are quick UI wins to deal with this:

  1. Add a loading state for the "summaries" in the "controls" page. This whole section is hidden until the request completes, and then the page jumps
image
  1. In /hosts/manage, most APIs respond within milliseconds, except for /api/latest/fleet/hosts/count which takes 6 seconds. This is used only to display the counts, but the whole page is blocked until the request completes
image

🧑‍💻  Steps to reproduce

  1. Either load a bunch of hosts in the UI or artificially make the relevant requests slow
  2. Navigate the "Controls" page

🕯️ More info (optional)

This was part of a load test, more details https://docs.google.com/document/d/1KYRxJEIB2Inav0daaXQnIsFI_Lga52uTOJotBEbHCu8/edit

🛠️ To fix

Fix specified in Figma here

roperzh commented 2 months ago

@marko-lisica these are the quick wins we identified. Just sanity checking if you're ok with the proposed solutions above.

marko-lisica commented 2 months ago

Add a loading state for the "summaries" in the "controls" page. This whole section is hidden until the request completes, and then the page jumps

@roperzh I agree with this, we should add loading state for summary section.

In /hosts/manage, most APIs respond within milliseconds, except for /api/latest/fleet/hosts/count which takes 6 seconds. This is used only to display the counts, but the whole page is blocked until the request completes

I think we shouldn't block whole page, we should display hosts in the table before the count. I guess it would be frustrating if user needs to wait 6 seconds each time when opening Hosts page. I'll move this one to drafting board, to think how should we do this (probably some sort of loading state for count).

marko-lisica commented 1 month ago

During design review today, we decided that we'll only solve 1. point (loading and error state for OS settings summary card).

@lukeheath will file an engineering-initiated story to improve loading time for /api/latest/fleet/hosts/count endpoint (or any other that takes too long).

I will file separate feature requests to improve the UI loading state for counts.

cc @noahtalerman

lukeheath commented 1 month ago

@roperzh I tested the /hosts/count endpoint on a load test with 21K hosts and it responded in < 1 second:

image

But MDM is not configured on this instance. Would that impact the count response time?

roperzh commented 1 month ago

@lukeheath sorry for not being clear on the issue description! Yes, MDM will affect, and it's also important to note that I was using the "OS settings" filter, which aggregates configuration profile data. This might happen for other MDM filters too

roperzh commented 1 month ago

@lukeheath over-communicating just so it's in your radar. From the load testing, this is almost a non-issue. The two issues we identified as must-fix (noted in the load testing doc) are:

lukeheath commented 1 month ago

@roperzh Got it, thanks! Glad to hear we captured the performance issues in separate tickets. This bug will just fix the loading and error states, and we'll address the API performance issues in those tickets.

ghernandez345 commented 1 week ago

@PezHub Just a note, this ticket is meant to just fix the first loading/error state for os profile summary, despite the issue mentioning of host counts. In the comments we have decided to just do this fix and improve the performance of host counts in another ticket.

PezHub commented 1 week ago

QA Notes:

Confirmed Issue 1 has been fixed as I now see the spinner in summaries control page (top section) separate from the rest of the page. Screenshot 2024-10-18 at 11 06 02 AM