Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
6.39k stars 4.28k forks source link

How to upload data with DI in portal? #1604

Open dicktangdev opened 6 months ago

dicktangdev commented 6 months ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [ ] feature request
- [x] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

I deployed the demo with quick start, finding a way to upload documents like DOCX, PPTX, XLSX through DI(as document mentioned that these document types are only supported by DI). I cannot open the DI with it's endpoint, it shows 404. And I also cannot find the indexer in AI search. Please kindly let me know how I can upload my data through Azure portal. Thanks!

Any log messages given by the failure

Expected/desired behavior

Upload data in portal

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

azd version?

run azd version and copy paste here.

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

zedhaque commented 6 months ago

Just wondering did you check below:

https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/data_ingestion.md

specifically indexing additional documents section. Maybe this is what you are looking for?

dicktangdev commented 6 months ago

Hi @zedhaque

Yes I read the indexing additional documents section. I saw that there is an index called "gptkbindex" created and I also upload new PDF files in the blob container "content" folder(the sample PDFs were all uploaded to here). But I am not quite sure how to "find the index, and run it" in Azure portal. The index cannot be run and there is not any indexer created. I also tested from UI and search explorer in the index "gptkbindex", the new uploaded PDFs content is not available. I am still try to figure out how to upload data through Azure portal. If you know the details, please kindly share with me. Thanks!

zedhaque commented 6 months ago

Just to confirm - when you click on the "indexer" in the AI Search menu item - nothing is there/it's an empty page? and your .env has below item as true: USE_FEATURE_INT_VECTORIZATION true

correct?

Not sure why the indexer and scheduler didn't get created. (https://learn.microsoft.com/en-us/azure/search/vector-search-integrated-vectorization).

You could set up the schedular in the portal manually and see if it picks up the document that you uploaded earlier.

heartleaf commented 6 months ago

I had a similar question like the original poster here, demo creates an index automatically question is how to index documents that are directly uploaded to storage containers. I know that scheduling indexing requires setting up a new indexer, but how to associate the indexer to the existing index and make sure new documents or updates to existing pdfs are automatically reindexed and made available in chatApp.

mikedizon commented 6 months ago

interested to know this as well. i am getting auth issues trying to index locally by running ./scripts/prepdocs.sh. the pdf i am trying to index is 7000 pages

dicktangdev commented 5 months ago

Just to confirm - when you click on the "indexer" in the AI Search menu item - nothing is there/it's an empty page? and your .env has below item as true: USE_FEATURE_INT_VECTORIZATION true

correct?

Not sure why the indexer and scheduler didn't get created. (https://learn.microsoft.com/en-us/azure/search/vector-search-integrated-vectorization).

You could set up the schedular in the portal manually and see if it picks up the document that you uploaded earlier.

@zedhaque Using Integrated Vectorization will create the indexer. Thank you for informing us that setting "USE_FEATURE_INT_VECTORIZATION true" enables Integrated Vectorization. However, we prefer to use DI as it handles chart data in PDFs much better. Is there a way to upload documents to blob storage and schedule indexing from it using DI, similar to how indexing with Integrated Vectorization is handled in AI search?