Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
5.91k stars 4.05k forks source link

Issue while running ./scripts/prepdocs.ps1 #1207

Open shivam10u opened 7 months ago

shivam10u commented 7 months ago

[notice] To update, run: C:\sampleproject Version 2.0\Latest sampleproject 2.0 on project1098\azure-search-openai-demo\scripts.venv\scripts\python.exe -m pip install --upgrade pip Running "prepdocs.py" ModuleNotFoundError: No module named '_cffi_backend' thread '' panicked at C:\Users\runneradmin.cargo\registry\src\index.crates.io-6f17d22bba15001f\pyo3-0.18.3\src\err\mod.rs:790:5: Python API call failed note: run with RUST_BACKTRACE=1 environment variable to display a backtrace Traceback (most recent call last): File "C:\sampleproject Version 2.0\Latest sampleproject 2.0 on project1098\azure-search-openai-demo\scripts\prepdocs.py", line 7, in from azure.identity.aio import AzureDeveloperCliCredential File "C:\sampleproject Version 2.0\Latest sampleproject 2.0 on project1098\azure-search-openai-demo\scripts.venv\lib\site-packages\azure\identity__init.py", line 10, in from ._credentials import ( File "C:\sampleproject Version 2.0\Latest sampleproject 2.0 on project1098\azure-search-openai-demo\scripts.venv\lib\site-packages\azure\identity_credentials__init.py", line 5, in from .authorization_code import AuthorizationCodeCredential File "C:\sampleproject Version 2.0\Latest sampleproject 2.0 on project1098\azure-search-openai-demo\scripts.venv\lib\site-packages\azure\identity_credentials\authorization_code.py", line 9, in from .._internal.aad_client import AadClient File "C:\sampleproject Version 2.0\Latest sampleproject 2.0 on project1098\azure-search-openai-demo\scripts.venv\lib\site-packages\azure\identity_internal\init__.py", line 5, in from .aad_client import AadClient File "C:\sampleproject Version 2.0\Latest sampleproject 2.0 on project1098\azure-search-openai-demo\scripts.venv\lib\site-packages\azure\identity_internal\aad_client.py", line 11, in from .aad_client_base import AadClientBase File "C:\sampleproject Version 2.0\Latest sampleproject 2.0 on project1098\azure-search-openai-demo\scripts.venv\lib\site-packages\azure\identity_internal\aad_client_base.py", line 20, in from .aadclient_certificate import AadClientCertificate File "C:\sampleproject Version 2.0\Latest sampleproject 2.0 on project1098\azure-search-openai-demo\scripts.venv\lib\site-packages\azure\identity_internal\aadclient_certificate.py", line 7, in from cryptography import x509 File "C:\sampleproject Version 2.0\Latest sampleproject 2.0 on project1098\azure-search-openai-demo\scripts.venv\lib\site-packages\cryptography\x509\init__.py", line 7, in from cryptography.x509 import certificate_transparency File "C:\sampleproject Version 2.0\Latest sampleproject 2.0 on project1098\azure-search-openai-demo\scripts.venv\lib\site-packages\cryptography\x509\certificate_transparency.py", line 11, in from cryptography.hazmat.bindings._rust import x509 as rust_x509 pyo3_runtime.PanicException: Python API call failed PS C:\sampleproject Version 2.0\Latest sampleproject 2.0 on project1098\azure-search-openai-demo>

PLEASE Note- Cffi and all requirement.txt is installed. it works fine with azd up but not manually.

shivam10u commented 7 months ago

@pamelafox Please Help?

pamelafox commented 7 months ago

Have you looked through the suggestions in the related threads? https://github.com/Azure-Samples/azure-search-openai-demo/issues/343 It's generally a Python environment issue, and there are some links/ideas there. If all else fails, you could use GitHub Codespaces for a clean Python environment.

shivam10u commented 7 months ago

Hi @pamelafox , I tried #343 the issue in this thread works fine for me!

image

But I am facing issues while manually adding additional documents! Is there any other way we can split and index documents from azure portal itself! Earlier I was using python 3.12 and I downgraded to 3.10 as per your suggestion and still have the same error when running ./scripts/prepdocs.ps1 I even tried troubleshooting from here as well but no luck- https://stackoverflow.com/questions/34370962/no-module-named-cffi-backend

pamelafox commented 7 months ago

You could try out the new integrated vectorization feature! That's all cloud-based indexing, using indexers and skills for chunking and vectorizing. Here's a PR: https://github.com/Azure-Samples/azure-search-openai-demo/pull/1159

You could check out that branch, follow the README steps in it, and re-run "azd up".

shivam10u commented 7 months ago

Hello @pamelafox,

I appreciate your prompt responses. I am currently working on implementing dynamic data ingestion within Azure. Specifically, I am planning to create an Azure Timer Trigger function that uploads PDF files to Blob Storage. Subsequently, a Blob Trigger Azure Function will be employed to execute the required scripts, such as prepdocs.py or prepdocs.ps1. These scripts will handle tasks such as splitting and indexing, eliminating the need for local file uploads and manual execution of commands.

The goal is to streamline the process, ensuring that every time a PDF is uploaded to the Blob, the associated script is automatically executed, providing the latest data responses in the Blob storage.

I have attempted to explore the provided repository, but unfortunately, the README.md file lacks detailed steps. It primarily contains information about changes made without explicit instructions on how to set up the environment.

Moreover, when attempting to check out the code, I encountered the following error:

Microsoft Windows [Version 10.0.22000.2713] (c) Microsoft Corporation. All rights reserved.

C:\Test Dynamic>git clone https://github.com/Azure-Samples/azure-search-openai-demo.git Cloning into 'azure-search-openai-demo'... remote: Enumerating objects: 4234, done. remote: Counting objects: 100% (498/498), done. remote: Compressing objects: 100% (284/284), done. remote: Total 4234 (delta 288), reused 350 (delta 193), pack-reused 3736 Receiving objects: 100% (4234/4234), 5.51 MiB | 8.17 MiB/s, done. Resolving deltas: 100% (2302/2302), done. Updating files: 100% (284/284), done.

C:\Test Dynamic>cd Azure-Samples/azure-search-openai-demo The system cannot find the path specified.

C:\Test Dynamic>cd azure-search-openai-demo

C:\Test Dynamic\azure-search-openai-demo>git checkout gh pr checkout 1159 error: pathspec 'gh' did not match any file(s) known to git error: pathspec 'pr' did not match any file(s) known to git error: pathspec 'checkout' did not match any file(s) known to git error: pathspec '1159' did not match any file(s) known to git

C:\Test Dynamic\azure-search-openai-demo>gh pr checkout 1159 'gh' is not recognized as an internal or external command, operable program or batch file.

C:\Test Dynamic\azure-search-openai-demo>git checkout 1159 error: pathspec '1159' did not match any file(s) known to git

C:\Test Dynamic\azure-search-openai-demo>git checkout srbalakr/int-vectorizer Switched to a new branch 'srbalakr/int-vectorizer' branch 'srbalakr/int-vectorizer' set up to track 'origin/srbalakr/int-vectorizer'.

C:\Test Dynamic\azure-search-openai-demo>npm install npm ERR! code ENOENT npm ERR! syscall open npm ERR! path C:\Test Dynamic\azure-search-openai-demo/package.json npm ERR! errno -4058 npm ERR! enoent ENOENT: no such file or directory, open 'C:\Test Dynamic\azure-search-openai-demo\package.json' npm ERR! enoent This is related to npm not being able to find a file. npm ERR! enoent

image

Could you kindly guide me on how to obtain the code locally, and if there are specific steps I should follow? Additionally, is there an alternative platform or method where we could connect for quicker responses or assistance?

Your guidance in achieving this dynamic data ingestion workflow would be highly valuable.

Thank you for your assistance.

pamelafox commented 7 months ago

It looks like you did manage to get the branch checked out, but then you ran npm install from the root folder. There's no package.json there, so it failed. The only folder where you'd ever run that would be app/frontend.

Once you have it checked out, you'd need to then run-

azd auth login azd env set USE_FEATURE_INT_VECTORIZATION true azd up

shivam10u commented 7 months ago

Hi @pamelafox , I tried checking out the branch in #1159 , I did as you directed and tried deploying the same using azd up but ended up in an error below. C:\Test Dynamic\azure-search-openai-demo\scripts\prepdocs.py:402: DeprecationWarning: There is no current event loop loop = asyncio.get_event_loop() Processing files... Using local files in C:\Test Dynamic\azure-search-openai-demo/data/ Ensuring search index gptkbindex exists Creating gptkbindex search index Traceback (most recent call last): File "C:\Test Dynamic\azure-search-openai-demo\scripts\prepdocs.py", line 408, in loop.run_until_complete(main(ingestion_strategy, azd_credential, args)) File "C:\Users\shivam.a.upadhyay\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 641, in run_until_complete return future.result() File "C:\Test Dynamic\azure-search-openai-demo\scripts\prepdocs.py", line 231, in main await strategy.setup(search_info) File "C:\Test Dynamic\azure-search-openai-demo\scripts\prepdocslib\integratedvectorizerstrategy.py", line 130, in setup await search_manager.create_index( File "C:\Test Dynamic\azure-search-openai-demo\scripts\prepdocslib\searchmanager.py", line 180, in create_index await search_index_client.create_or_update_index(index) File "C:\Test Dynamic\azure-search-openai-demo\scripts.venv\lib\site-packages\azure\core\tracing\decorator_async.py", line 77, in wrapper_use_tracer return await func(args, *kwargs) File "C:\Test Dynamic\azure-search-openai-demo\scripts.venv\lib\site-packages\azure\search\documents\indexes\aio_search_index_client.py", line 262, in create_or_update_index result = await self._client.indexes.create_or_update( File "C:\Test Dynamic\azure-search-openai-demo\scripts.venv\lib\site-packages\azure\core\tracing\decorator_async.py", line 77, in wrapper_use_tracer return await func(args, **kwargs) File "C:\Test Dynamic\azure-search-openai-demo\scripts.venv\lib\site-packages\azure\search\documents\indexes_generated\aio\operations_indexes_operations.py", line 506, in create_or_update raise HttpResponseError(response=response, model=error) azure.core.exceptions.HttpResponseError: (InvalidRequestParameter) The request is invalid. Details: definition : Unknown vectorizer name 'myOpenAI' in Vector Search Profile 'embedding_config'. Code: InvalidRequestParameter Message: The request is invalid. Details: definition : Unknown vectorizer name 'myOpenAI' in Vector Search Profile 'embedding_config'. Exception Details: (UnknownVectorSearchVectorizerConfiguration) Unknown vectorizer name 'myOpenAI' in Vector Search Profile 'embedding_config'. Parameters: definition Code: UnknownVectorSearchVectorizerConfiguration Message: Unknown vectorizer name 'myOpenAI' in Vector Search Profile 'embedding_config'. Parameters: definition

Deploying services (azd deploy)

(✓) Done: Deploying service backend

Alternatively, I tried creating "Import and vectorize data" on portal itself but no luck, I am getting the results on portal but webapp seems to be lost without any indexed document.

image

Please help me fix this end to end. My objective is to just upload the docs in azure blob and boom I should start getting a response.

pamelafox commented 7 months ago

@shivam10u I am going to test out the branch myself soon, so I'll see if I have the same error. cc @srbalakr

shivam10u commented 7 months ago

I feel this error is due to already exiting indexer with the same name under two search services in the portal and ThankYou @pamelafox , Please keep me posted, I am really excited to listen something good. @srbalakr Do you have any suggestions/fix?

kail85 commented 7 months ago

@shivam10u , I got the same error message, the last few lines are

azure.core.exceptions.HttpResponseError: (InvalidRequestParameter) The request is invalid. Details: definition : Unknown vectorizer name 'myOpenAI' in Vector Search Profile 'embedding_config'.
Code: InvalidRequestParameter
Message: The request is invalid. Details: definition : Unknown vectorizer name 'myOpenAI' in Vector Search Profile 'embedding_config'.
Exception Details:      (UnknownVectorSearchVectorizerConfiguration) Unknown vectorizer name 'myOpenAI' in Vector Search Profile 'embedding_config'. Parameters: definition
        Code: UnknownVectorSearchVectorizerConfiguration
        Message: Unknown vectorizer name 'myOpenAI' in Vector Search Profile 'embedding_config'. Parameters: definition

ERROR: failed running post hooks: 'postprovision' hook failed with exit code: '1', Path: '/tmp/azd-postprovision-1998154611.sh'. : exit code: 1
shivam10u commented 7 months ago

Hi @pamelafox , Did you get a chance to look into it?

shivam10u commented 7 months ago

Hi @pamelafox , Did you get a chance to try this? Also when can we expect this PR to be merged as this another important feature so that we can avoid splitiing,chunking and indexing locally.