Open jimczi opened 1 month ago
Pinging @elastic/es-search-relevance (Team:Search Relevance)
@jimczi something else to help with this would be:
slice:{from:,to:}
parameter that takes that section of the vector and that is the only part that is sent to query or index.The idea behind "slice" is that Matryoshka Embeddings would be interesting to use in the following manner:
PUT my-index
{
"mappings": {
"properties": {
"emb_head": {
"type": "dense_vector",
"slice": {"from": 0, "to": 384}
"fields": {
"emb_tail": {
"type": "dense_vector",
"slice": {"from": 384, "to": 1024},
"index": false
}
}
}
}
}
}
Since dot-product values can just be summed, you end up with:
POST my-index/_search
{
"query": {"knn": "emb_head", "query_vector": [...]},
"rescore" : {
"window_size" : 50,
"query" : {
"rescore_query" : {
"script_score": {"query" : {"match_all": {}},
"script": {
"source": """
double value = dotProduct(params.query_vector, 'emb_head.emb_tail');
return sigmoid(1, Math.E, -value);
""",
"params": {
"query_vector": [...]
}
}
}
},
"query_weight" : 1.0,
"rescore_query_weight" : 1.0,
"score_mode": "sum"
}
}
}
So, you have the whole vector in _source just once, and do a knn
query over the first piece and just sum it with the tail since doc-product is just the summation of dimensional products
Multi-field dense vectors seems like a much better fit for this use case, @benwtrent! We should definitely add the support if we allow dimension shrinking/slicing. I like the slice idea too, that's more flexible at the expense of a new parameter in the mapping.
Description
Currently, if the number of dimensions in a dense vector field doesn't match the input vector, an error is thrown. It would be helpful to support automatic dimension shrinking for models trained with Matryoshka (which allows dimension flexibility), without requiring this to be done offline. This feature would be particularly useful in scenarios like inference API usage, where models, such as those provided by OpenAI, offer flexible dimensions for indexing.
For example, if a model outputs 1024 dimensions, users could define a mapping like this:
Then at query time, the
emb_short
field can be used for approximate nearest neighbour search through the HNSW index and rescore the results using theemb_full
field to increase the recall.The downside is that it would now be impossible to determine if the number of dimensions was misconfigured for the
emb_short
field in this example. However we could continue to throw an error if the input vector has less dimensions that the value configured in the mapping.At query time, the emb_full field can be used for approximate nearest neighbor search through the HNSW index, and the results can be rescored using the emb_full field to improve recall.
The potential downside is that it would no longer be possible to detect if the number of dimensions was misconfigured for the
emb_short
field in this scenario. However, we could still throw an error if the input vector has fewer dimensions than the value configured in the mapping.