Investigate native impl for int4 vector comparators

benwtrent commented 3 months ago

Description

Int4 is a new half-byte quantization mechanism. Right now, the code is pretty fast when reading onto heap and using panama vector, but I suspect we can do much faster for arm (neon), etc. if we gave the comparators the native treatment as we have for int7 quantization.

Note, for int4, we always compress the two half-bytes into a single byte representation and the query vector is always uncompressed. So, we should consider what it looks like for this to run at query time vs. merge time. My gut is that during merge, we uncompress the vector to index it, this way the query optimized comparators and the merge optimized ones can be the same.

//cc @tveasey @ChrisHegarty @ldematte

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search (Team:Search)

ldematte commented 3 months ago

Hey @benwtrent, I did some investigation on this during spacetime (for x64 though): https://github.com/elastic/elasticsearch/pull/109238

TL;DR: with AVX, performance for int4 are either the same as (native) int7, and better in some cases, due to the the fact that on some processors we are read-limited (so reading twice as much data from one vector helps)

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elastic / elasticsearch

Investigate native impl for int4 vector comparators #109811

Description