Open benwtrent opened 3 months ago
Pinging @elastic/es-search (Team:Search)
Hey @benwtrent, I did some investigation on this during spacetime (for x64 though): https://github.com/elastic/elasticsearch/pull/109238
TL;DR: with AVX, performance for int4 are either the same as (native) int7, and better in some cases, due to the the fact that on some processors we are read-limited (so reading twice as much data from one vector helps)
Pinging @elastic/es-search-relevance (Team:Search Relevance)
Description
Int4 is a new half-byte quantization mechanism. Right now, the code is pretty fast when reading onto heap and using panama vector, but I suspect we can do much faster for arm (neon), etc. if we gave the comparators the native treatment as we have for
int7
quantization.Note, for
int4
, we always compress the two half-bytes into a single byte representation and the query vector is always uncompressed. So, we should consider what it looks like for this to run at query time vs. merge time. My gut is that during merge, we uncompress the vector to index it, this way the query optimized comparators and the merge optimized ones can be the same.//cc @tveasey @ChrisHegarty @ldematte