fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
2.99k stars 414 forks source link

Queries page takes a long time to load with a few hundred queries #21855

Open Zaulao opened 4 weeks ago

Zaulao commented 4 weeks ago

Fleet version: Fleet 4.55.1 • Go go1.22.4

Web browser and operating system: Chrome 128.0.6613.119 and Firefox 130.0 running on Ubuntu Linux


💥  Actual behavior

A deployment (in Kubernetes) with just over 400 queries causes the query page to take more than 20 seconds to load, remaining frozen with very high CPU usage during that time.

(In the video linked below, the loading animation freezes, but this is a problem with the recording. In fact, the animation continues to rotate until the tab is changed) https://github.com/user-attachments/assets/daf92bf1-936c-458c-88da-9ad31fb84d7e

Screenshot of computer resource consumption statistics during page loading. Page load started at this network spike and lasted until CPU stats normalized: system-monitor-data

Screenshot of the performance profile generated by Google Chrome during page loading: image

Screenshot of the performance profile generated by Mozilla Firefox during page loading: image

🧑‍💻  Steps to reproduce

  1. Load a large number of different queries into Fleet, easily through community-distributed packs such as osquery's native ones, osquery-attck and osquery-defense-kit;
  2. Load or refresh /queries/manage page.

To fix:

Paginate queries on front and backend

xpkoala commented 3 weeks ago

@Zaulao thank you for submitting this issue, we are currently working on reproducing it.

Just so we have a better understanding of your environment, could you let us know if your server configuration is inline with our recommendations and also let us know the number of enrolled hosts.

Zaulao commented 3 weeks ago

Hello @xpkoala! Yes, currently I'm using a kubernetes deployment with the appropriate resource providing. We have around 160 enrolled hosts, so the server's resource consumption is quite light.

xpkoala commented 3 weeks ago

Thank you @Zaulao I've got a test instance running on my local machine with 438 queries and ~105 hosts. I'm seeing a load time of ~5 seconds across safari, chrome, and firefox.

Are the queries present large (in terms of character count - like in the thousands of characters)?

I'll continue experimenting with this and see if I can get the longer lag time to generate.

image

Zaulao commented 3 weeks ago

Thanks for the testing @xpkoala. Yes, there are a few pretty long queries, specially the ones generated by the osquery-defense-kit (using the make detect build option)

RachelElysia commented 3 days ago

This is one of the few pages that don't have serverside pagination.

Suggestion is we switch this page to serverside pagination, IIRC we would need to build the API to allow pagination and switch the FE code from client side to server side.

@rachaelshaw thoughts on this solution?

rachaelshaw commented 3 days ago

@RachelElysia that makes total sense to me— for some reason I thought we had added pagination when we added query reports; I definitely agree we should have it!

RachelElysia commented 3 days ago

Ok sweet, I think that should be the solution and test it!

jacobshandling commented 2 days ago

I'd estimate FE 5 here

RachelElysia commented 2 days ago

Hey team! Please add your planning poker estimate with Zenhub @jacobshandling @iansltx @lucasmrod @mostlikelee @getvictor

jacobshandling commented 2 days ago

Consensus on FE 5

sharon-fdm commented 2 days ago

Estimation for paginating as a solution:

rachaelshaw commented 2 days ago

API doc changes: https://github.com/fleetdm/fleet/pull/22590