fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.01k stars 416 forks source link

re-think DB indexes on the `hosts` table #17963

Closed roperzh closed 1 month ago

roperzh commented 6 months ago

Goal

User story
As a customer relying on the stability of the product,
I want the application to perform consistently under various loads
so that I can trust the product to function correctly during critical business operations.

Context

Unused indexes slow down data insertion because the database must update these indexes even though they are not used, consuming additional time and processing power. This is specially critical for tables that are updated often, which is the case of the hosts table.

But, the advantage of paying the cost of indexes is that you can access data faster.

Keeping that in mind, we have two heavy indexes that we don't use on hosts:

 FULLTEXT KEY `host_ip_mac_search` (`primary_ip`,`primary_mac`),
 FULLTEXT KEY `hosts_search` (`hostname`,`uuid`,`computer_name`)

And, at the same time, we desperately need to add an index on hosts.uuid, which is currently not indexed and used for all MDM operations.

Changes

Engineering

ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

QA

Risk assessment

Manual testing steps

  1. Run migrations
  2. Turn on MDM features for hosts
  3. Smoke tests for all MDM features and osquery integrations.

Confirmation

  1. [ ] Engineer (@____): Added comment to user story confirming successful completion of QA.
  2. [ ] QA (@____): Added comment to user story confirming successful completion of QA.
getvictor commented 6 months ago

Is this a duplicate of https://github.com/fleetdm/fleet/issues/9372?

roperzh commented 6 months ago

I think I would close https://github.com/fleetdm/fleet/issues/9372 in favor of this, this is about indexes in hosts in general 👍

mna commented 5 months ago

@roperzh One thing I'd add as part of this ticket, is to verify our Host...ByIdentifier queries. I'm not sure, but I think that due to the way the WHERE clause is done in those queries, it may still not be able to use the hosts.uuid (or other) index. Maybe there's a more fancy index we could maintain to help satisfy those (widely used, I think) queries.

noahtalerman commented 3 months ago

Hey @mna is this improvement merged? I see that we merged #18453

@georgekarrv and @lukeheath for now, I moved this story back to "Settled" because it doesn't seem to have an estimate yet.

mna commented 3 months ago

@noahtalerman AFAIK this has not been addressed yet, no.

georgekarrv commented 2 months ago

Hey team! Please add your planning poker estimate with Zenhub @dantecatalfamo @ghernandez345 @gillespi314 @jahzielv @mna @roperzh

getvictor commented 2 months ago

hosts.uuid lookup may be sped up by using BINARY(16) data type for that column. For reference, see recent migration

lukeheath commented 2 months ago

I'm going to deprioritize this for now. It's a great update, but it's something we can revisit after our Q3 push.

PezHub commented 1 month ago

QA Notes: We will move to Ready for Release once Load testing and Smoke tests are completed

fleet-release commented 1 month ago

Refined indexes bloom, Fleet's performance unburdened, Trust in system soars.