Open ShivangiReja opened 4 months ago
Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.
I'm interesting in whether we'd consider an alternate name for the type of VectorEmbedding
(instead of EmbeddingVector
). My understanding is that there are multiple meanings for "embeddings" in the deep-learning space, so "embedding" here is a noun and "vector" is the adjective that differentiates which type of embedding you're referring to.
As a higher-level point, as we add types to the BCL that map to concepts in AI/ML, it feels like there's value in having their names align well with the terminology used for the same concepts. One benefit of this might be that folks searching online for more information about what the type represents would more easily land on documentation from the broader ML community. For this one, I found many articles referencing "vector embeddings" but few on "embedding vectors."
Is there reason to have this in the BCL rather than a standalone library?
Why not make it part of SmartComponents.LocalEmbeddings?
Background and motivation
Currently, AI libraries in the .NET ecosystem, e.g. OpenAI, Azure AI Search, use
ReadOnlyMemory<float>
to represent embedding vectors. However, embeddings can be of narrower types such asint8
,int16
,float16
, etc., which consume less memory, providing both cost and performance benefits. This proposal aims to introduce a versatile container for embeddings that can handle various data types, enabling more efficient memory usage and broader interoperability among different services (e.g., retrieving vectors from services like OpenAI and storing them in vector databases like Azure Search).API Proposal
API Usage
Here's how we can use it with OpenAI, which returns a base64 encoded string:
And here's how we can use it with Azure Search, which returns a JSON array:
For end-to-end working examples, please see: EmbeddingType/Program.cs
Alternative Designs
No response
Risks
No response
Discussion Points