Closed tteofili closed 1 month ago
I'm a bit confused: what is the benefit of having it on segment infos in addition to field infos?
you're right @jpountz , we can probably get away with fieldInfo.getAttribute(PerFieldKnnVectorFormat.PER_FIELD_FORMAT_KEY)
, I didn't notice that, thanks!
When indexing vectors, it is possible to use different vector formats depending on the field; in addition to that it's also possible (although not currently implemented) to have
Codecs
that can provide different vector formats "dynamically" even for a same field. To better debug such situations, it would be helpful to have per field vector format information withinSegmentCommitInfo
(e.g. within theattributes
).This trivial PR adds
KnnVectorFormat#name
for each field toSegmentInfo#attributes
inPerFieldKnnVectorsFormat
. If a doc withfield1
is indexed withLucene99HnswVectorsFormat
and a doc withfield2
is indexed withLucene99HnswScalarQuantizedVectorsFormat
within the same segment, the correspondingSegmentInfo#attributes
will have the following entries:"KnnVectorFormat.field1"
->"Lucene99HnswVectorsFormat"
"KnnVectorFormat.field2"
->"Lucene99HnswScalarQuantizedVectorsFormat"