Azure-Samples / chat-with-your-data-solution-accelerator

A Solution Accelerator for the RAG pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences. This includes most common requirements and best practices.
https://azure.microsoft.com/products/search
MIT License
810 stars 411 forks source link

Unable to parse and estimate tokens from incoming request. Please ensure incoming request is of one of the following types: 'Chat Completion', 'Completion', 'Embeddings' and works with current prompt estimation mode of 'Auto' #1393

Open strudel0209 opened 1 week ago

strudel0209 commented 1 week ago

Describe the bug

I am trying to integrate the web app (frontend) client with Azure APIM that serves as a proxy for OpenAI endpoint. In doing so, I have changed the web app/s environment variable from AZURE_OPENAI_RESOURCE to AZURE_OPENAI_ENDPOINT pointing to the Azure APIM Gateway - based on the code logic that it is being used:

# Set env for Azure OpenAI self.AZURE_OPENAI_ENDPOINT = os.environ.get( "AZURE_OPENAI_ENDPOINT", f"https://{self.AZURE_OPENAI_RESOURCE}.openai.azure.com/", ) The configuration uses RBAC authentication mechanism as opposed of keys. For testing purposes, I have switched off authentication requirements in the APIM policies. When making a request from froentend client to APIM (embedding endpoint), I get the following error:

2024-10-08T09:03:24.8437063Z INFO:httpx:HTTP Request: POST https://apim-46lbub4uk2rua.azure-api.net//openai/deployments/text-embedding-ada-002/embeddings?api-version=2024-02-01 "HTTP/1.1 400 Bad Request" 2024-10-08T09:03:24.8632024Z ERROR:create_app:Exception in /api/conversation | Error code: 400 - {'statusCode': 400, 'message': "Unable to parse and estimate tokens from incoming request. Please ensure incoming request is of one of the following types: 'Chat Completion', 'Completion', 'Embeddings' and works with current prompt estimation mode of 'Auto'."} 2024-10-08T09:03:24.8632566Z Traceback (most recent call last): 2024-10-08T09:03:24.8632612Z File "/usr/src/app/create_app.py", line 419, in conversation_custom 2024-10-08T09:03:24.8632645Z messages = await message_orchestrator.handle_message( 2024-10-08T09:03:24.8632679Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-08T09:03:24.8632714Z File "/usr/src/app/backend/batch/utilities/helpers/orchestrator_helper.py", line 22, in handle_message 2024-10-08T09:03:24.8632746Z orchestrator = get_orchestrator(orchestrator.strategy.value) 2024-10-08T09:03:24.8632779Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-08T09:03:24.8632842Z File "/usr/src/app/backend/batch/utilities/orchestrator/strategies.py", line 10, in get_orchestrator 2024-10-08T09:03:24.8632872Z return OpenAIFunctionsOrchestrator() 2024-10-08T09:03:24.8632902Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-08T09:03:24.8632938Z File "/usr/src/app/backend/batch/utilities/orchestrator/open_ai_functions.py", line 17, in __init__ 2024-10-08T09:03:24.8632972Z super().__init__() 2024-10-08T09:03:24.8633209Z File "/usr/src/app/backend/batch/utilities/orchestrator/orchestrator_base.py", line 20, in __init__ 2024-10-08T09:03:24.8633243Z self.conversation_logger: ConversationLogger = ConversationLogger() 2024-10-08T09:03:24.8633295Z ^^^^^^^^^^^^^^^^^^^^ 2024-10-08T09:03:24.8633494Z File "/usr/src/app/backend/batch/utilities/loggers/conversation_logger.py", line 8, in __init__ 2024-10-08T09:03:24.8633523Z self.logger = AzureSearchHelper().get_conversation_logger() 2024-10-08T09:03:24.8633553Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-08T09:03:24.8633587Z File "/usr/src/app/backend/batch/utilities/helpers/azure_search_helper.py", line 227, in get_conversation_logger 2024-10-08T09:03:24.8633616Z vector_search_dimensions=self.search_dimensions, 2024-10-08T09:03:24.8633644Z ^^^^^^^^^^^^^^^^^^^^^^ 2024-10-08T09:03:24.8633677Z File "/usr/src/app/backend/batch/utilities/helpers/azure_search_helper.py", line 78, in search_dimensions 2024-10-08T09:03:24.8633726Z self.llm_helper.get_embedding_model().embed_query("Text") 2024-10-08T09:03:24.8633761Z File "/usr/local/lib/python3.11/site-packages/langchain_openai/embeddings/base.py", line 632, in embed_query 2024-10-08T09:03:24.8633965Z return self.embed_documents([text])[0] 2024-10-08T09:03:24.8633996Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-08T09:03:24.8634031Z File "/usr/local/lib/python3.11/site-packages/langchain_openai/embeddings/base.py", line 592, in embed_documents 2024-10-08T09:03:24.8634241Z return self._get_len_safe_embeddings(texts, engine=engine) 2024-10-08T09:03:24.8634274Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-08T09:03:24.8634332Z File "/usr/local/lib/python3.11/site-packages/langchain_openai/embeddings/base.py", line 490, in _get_len_safe_embeddings 2024-10-08T09:03:24.8634364Z response = self.client.create( 2024-10-08T09:03:24.8634393Z ^^^^^^^^^^^^^^^^^^^ 2024-10-08T09:03:24.8648962Z File "/usr/local/lib/python3.11/site-packages/openai/resources/embeddings.py", line 124, in create 2024-10-08T09:03:24.8649040Z return self._post( 2024-10-08T09:03:24.8649074Z ^^^^^^^^^^^ 2024-10-08T09:03:24.8649110Z File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1270, in post 2024-10-08T09:03:24.8649146Z return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) 2024-10-08T09:03:24.8649217Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-10-08T09:03:24.8649254Z File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 947, in request 2024-10-08T09:03:24.8649285Z return self._request( 2024-10-08T09:03:24.8649315Z ^^^^^^^^^^^^^^ 2024-10-08T09:03:24.8649352Z File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1051, in _request 2024-10-08T09:03:24.8649386Z raise self._make_status_error_from_response(err.response) from None 2024-10-08T09:03:24.8649438Z openai.BadRequestError: Error code: 400 - {'statusCode': 400, 'message': "Unable to parse and estimate tokens from incoming request. Please ensure incoming request is of one of the following types: 'Chat Completion', 'Completion', 'Embeddings' and works with current prompt estimation mode of 'Auto'."} In APIM logs, I can see that the requets body being intercepted is: {"input": [[1199]], "model": "text-embedding-ada-002", "encoding_format": "base64"} which seems to be coming from the test_conversation.py:

`def test_post_makes_correct_calls_to_openai_embeddings_to_get_vector_dimensions( app_url: str, app_config: AppConfig, httpserver: HTTPServer ):

when

requests.post(f"{app_url}{path}", json=body)

# then
verify_request_made(
    mock_httpserver=httpserver,
    request_matcher=RequestMatcher(
        path=f"/openai/deployments/{app_config.get_from_json('AZURE_OPENAI_EMBEDDING_MODEL_INFO','model')}/embeddings",
        method="POST",
        json={
            "input": [[1199]],
            "model": "text-embedding-ada-002",
            "encoding_format": "base64",
        },
        headers={
            "Accept": "application/json",
            "Content-Type": "application/json",
            "Authorization": f"Bearer {app_config.get('AZURE_OPENAI_API_KEY')}",
            "Api-Key": app_config.get("AZURE_OPENAI_API_KEY"),
        },
        query_string="api-version=2024-02-01",
        times=1,
    ),
)`

Why is the request being redirected to the http mocking server?

Expected behavior

Expected for the Text prompt to be passed to the request being sent to the APIM.

How does this bug make you feel?

sad, very sad

Debugging information

Steps to reproduce

Steps to reproduce the behavior:

  1. Scenario requires Azure APIM to be in place with OpenAI serving as backend, and embedding model/s
  2. Replace Web App environment variable AZURE_OPENAI_RESOURCE with AZURE_OPENAI_ENDPOINT and provide APIM Gateway endpoint
  3. Make sure that the APIM managed identity has permissions to access the OpenAI backend - see all necessary prerequisites on this page: https://github.com/Azure-Samples/ai-hub-gateway-solution-accelerator/blob/main/guides/openai-onboarding.md
  4. make a request from web app frontend to APIM OpenAI backend - check error logs

Screenshots

See error logs above

Logs

If applicable, add logs to help the engineer debug the problem.


Tasks

To be filled in by the engineer picking up the issue

Roopan-Microsoft commented 1 week ago

Hi @strudel0209, Thanks for your feedback. We will investigate the issue from our end and keep you posted.

strudel0209 commented 3 days ago

Hi @Roopan-Microsoft , any news?

Roopan-Microsoft commented 3 days ago

Hey @strudel0209 - Apologies for the delay. The team is prioritizing this issue and may need some additional details from you. They will reach out shortly.

Prasanjeet-Microsoft commented 2 days ago

Hello @strudel0209 , Could you please let us know how you’re integrating the Open AI API in APIM? Are you manually creating the API or importing through Azure Open AI template? If possible, could you also provide the API details you’re using in this accelerator? This information will help us troubleshoot the issue more effectively.