Azure / azureml-examples

Official community-driven Azure Machine Learning examples, tested with GitHub Actions.
https://docs.microsoft.com/azure/machine-learning
MIT License
1.66k stars 1.33k forks source link

Creating Faiss index with OpenAI and Embeddingstore SDK generates runtime error #2515

Open ramaOru opened 11 months ago

ramaOru commented 11 months ago

Operating System

MacOS

Version Information

Followed the steps at https://github.com/Azure/azureml-examples/blob/main/sdk/python/generative-ai/promptflow/create_faiss_index.ipynb to generate a Faiss index (using OpenAI instead of AzureOpenAI) that creates failure at runtime with Azure ML studio as an "internal server error"

Steps to reproduce

  1. Generate a faiss index using the method described in https://github.com/Azure/azureml-examples/blob/main/sdk/python/generative-ai/promptflow/create_faiss_index.ipynb but instead of using AOAI, use Open AI and ignore the API Version.

    MODEL_API_VERSION = "2023-05-15"

    MODEL_DEPLOYMENT_NAME = "text-embedding-ada-002" DIMENSION = 1536

    Configure an embedding store to store index file.

    store_path = os.path.join(os.getcwd(), "faiss_index_store") config = StoreCoreConfig.create_config( storage_type=StorageType.LOCAL, store_identifier=store_path, model_type=EmbeddingModelType.OPENAI, model_api_base=os.environ["OpenAI_MODEL_ENDPOINT"], model_api_key=os.environ["OpenAI_MODEL_API_KEY"], model_name=MODEL_DEPLOYMENT_NAME, dimension=DIMENSION, create_if_not_exists=True, )

  2. Example faiss is successfully generated and tested for validity. Present here: https://github.com/ramaOru/test_faiss_index2

  3. To test flows, simplified the Azure ML studio steps and also copied example "golden" faiss files from (https://github.com/Azure/azureml-assets/tree/main/assets/promptflow/data/faiss-index-lookup/faiss_index_sample) to https://github.com/ramaOru/test_faiss_index

  4. In the same Azure ML Studio flow, if you provide the path to the faiss index as 'https://github.com/ramaOru/test_faiss_index', it runs successfully. But if you provide the one that is generated in step2, it fails with: 2023-08-01 23:59:25 +0000 3710 promptflow-runtime INFO Validating 'AzureML Data Scientist' user authentication... 2023-08-01 23:59:25 +0000 3710 promptflow-runtime INFO Successfully validated 'AzureML Data Scientist' user authentication. 2023-08-01 23:59:25 +0000 3710 promptflow-runtime INFO Initialized table client for AzureMLRunTracker. 2023-08-01 23:59:25 +0000 3710 promptflow-runtime INFO Initialized blob service client for AzureMLRunTracker. 2023-08-01 23:59:25 +0000 3710 promptflow-runtime INFO Setting mlflow tracking uri to 'azureml://westus.api.azureml.ms/mlflow/v1.0/subscriptions/f87875ce-febb-4c1e-ab24-10e5b0e6c955/resourceGroups/rama.oruganti-rg/providers/Microsoft.MachineLearningServices/workspaces/csm_digitaltwin' 2023-08-01 23:59:25 +0000 3710 promptflow-runtime INFO Start execute request: 2dc4b9af-96f4-41ae-af1c-52e3eb73e263 in dir requests/2dc4b9af-96f4-41ae-af1c-52e3eb73e263... 2023-08-01 23:59:25 +0000 3710 execution.flow INFO Create/update flow info for run 2dc4b9af-96f4-41ae-af1c-52e3eb73e263 finished in 0.006969250011024997 seconds 2023-08-01 23:59:25 +0000 3710 execution.flow INFO Root flow run found in run storage 'AzureMLRunStorage'. Run id: '2dc4b9af-96f4-41ae-af1c-52e3eb73e263', flow id: '715f3fb4-4ffb-4756-a672-9526bb10c22e'. 2023-08-01 23:59:26 +0000 3710 execution ERROR Failed to execute flow. Exception: Failed to load tool 'Faiss Index Lookup' for node 'lookup' due to 'internal server error'. Traceback (most recent call last): File "/azureml-envs/prompt-flow/runtime/lib/python3.9/site-packages/promptflow/executor/common.py", line 25, in _load_tools_and_update_node_inputs loaded_tool, init_inputs = _load_tool(tool, api_name, node.inputs)

Expected behavior

Based on the language on the page, both OpenAI and Azure OpenAI should result in Faiss files that are usable in Azure ML Studio.

Embedding store sdk supports multiple types of embedding models (Azure OpenAI, OpenAI) and multiple types of store path (local path, HTTP URL, Azure blob). In this example, configure an embedding store with Azure OpenAI embedding model and local store path.

Actual behavior

Using OpenAI to generate faiss files, results in runtime errors in Azure ML Studio. Verified this with multiple valid faiss files and an alternate suggested method to generated faiss file (https://python.langchain.com/docs/integrations/vectorstores/faiss)

Addition information

No response

rileynjohnson commented 6 months ago

same here.