elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.51k stars 24.6k forks source link

Support for bit precision in the Inference API text_embedding task #111747

Open jimczi opened 1 month ago

jimczi commented 1 month ago

Description

Some inference API providers now support embedding models with each dimension defined as a single bit. For example, the v3 models from Cohere offer this capability. Since we already handle the bit element type in the dense vector field, it would be beneficial to extend this support to allow the text_embedding task of the inference API to output vectors with bit precision.

Typically, bit vectors are paired with float or byte vectors to improve recall by rescoring the hits from bit vectors with higher precision vectors. To support this, we suggest allowing the text_embedding task to generate multiple vectors for the same input at different precisions (e.g., bits + floats or bits + int8). While this functionality is already available in the Cohere API, implementing it in the inference API would optimize performance by eliminating the need to make two separate API calls for each precision, thereby reducing costs for users.

This would require the mapping to be defined with two fields, each corresponding to a different precision.

Additionally, we should evaluate whether the semantic_text field could natively support this scenario.

elasticsearchmachine commented 1 month ago

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine commented 1 month ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)