Azure / azure-search-vector-samples

A repository of code samples for Vector search capabilities in Azure AI Search.
https://azure.microsoft.com/products/search
MIT License
759 stars 325 forks source link

Feature Request: Collection of vector fields #22

Open kirk-marple opened 1 year ago

kirk-marple commented 1 year ago

For our use case, we are ingesting long documents and audio transcripts. The amount of text we're starting with exceeds the 8K limit of the Ada embedding model.

So we need to create multiple embeddings from each piece of content.

Since we can only store one vector per search document, I had to come up with a hacky solution to store 'n' search documents per content. (Basically one parent search document, and 'n' child search documents, n == # of chunks).

If the Cog Search index could support a collection of complex types, each which included a vector, it would make this scenario much cleaner for these use cases.

Currently, it errors with "Only a top-level field of the index can be a vector field."

farzad528 commented 1 year ago

Thanks for the feedback, we are actively working on this feature but have no ETA at this time.

smharvey commented 1 year ago

Excellent. We are also interested in the same.

rsloggett-prmaconsulting commented 1 year ago

This seems like a really key problem to solve - how many real world documents are small enough that you would be able to include them directly in a prompt? Carving up a document (on ingestion or indexing) such that it is possible to find and retrieve just the relevant portions in a prompt seems like a blocking requirement. Wonder how others are solving this problem with Cognitive search?

davidjrh commented 1 year ago

Related to this, found these two fields in an index created by the Custom Answering service. My plan was to use embeddings and build a similar QnA service as suggested in a lot of Microsoft slides, but seeing those, I'm wondering if the pattern is being implemented on the Custom Answering service.

image

nick4fake commented 2 weeks ago

Any news? Seems like a quite simple use case