Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
6.05k stars 4.14k forks source link

Issue while trying to have integration vectorization enabled. #1498

Open shivam10u opened 6 months ago

shivam10u commented 6 months ago

Hi @pamelafox , I have been trying to use Integrated vectorization but after the deployment only search index is getting created even after enabling "azd env set USE_FEATURE_INT_VECTORIZATION true" , Please help me I see that the code is capable of it but still this issue.

PFA-

image image

My aim is very simple -

  1. Get the files directly uploaded in the blob and no need to run prepdocs.py 2, Have multi-format document supported.
pamelafox commented 6 months ago

It says that the index has 2,161 documents in it, so it did index something. Or was that from running prepdocs.py before? You should see logs from prepdocs.py that describe the process of setting up the integrated vectorization, please share those as well.

chetan2309 commented 6 months ago

@pamelafox - I have a query on this topic, I was not sure to raise an issue for something I have questions about. So using this open thread. Please suggest if this is not

So I ran for a few days with all these options enabled until I discovered this option Integrated Vectorization

So I followed the documentation and enabled it. Regarding the quality of results, what difference can I expect when Integrated Vectorization is enabled and when it is not and I use the below options?

image

pamelafox commented 6 months ago

There are some differences between local prepdocs ingestion and integrated vectorization, specifically:

If you do see lower quality due to the cracking or splitting algorithm, please write up your findings so that the search team may make improvements as necessary. Thanks!

dchandu320 commented 5 months ago

Hi @pamelafox, I have currently integration vectorization enabled in my code and it is running fine, but as you mentioned I am not able to see page number in the index. Are you planning to implement it in the future by any chance and how this integration vectorization pipeline approach is better than previous approach.

pamelafox commented 5 months ago

That feature would need to be implemented in the Azure AI Search internal code, not in this repo itself. The Azure AI Search team does not have a public ETA for the feature, but are aware of the need for it.