Open sandeepsatishcopilot opened 4 months ago
Could you provide a longer traceback? I assume it happened with a batch embeddings call but want to confirm. Happened during /scripts/prepdocs.ps1. I think it fails here
Extracting text from 'E:\test\01. AI\openai-demo-2024-01-27/data\Benefit_Options.pdf' using Azure Document Intelligence
Splitting 'Benefit_Options.pdf' into sections
Uploading blob for whole file -> Benefit_Options.pdf
Traceback (most recent call last):
File "E:\test\01. AI\openai-demo-2024-01-27\scripts\prepdocs.py", line 318, in
Is this using openai.com OpenAI or Azure OpenAI? Azure Open AI Is this on the provided sample data or your own data? With sample data
Hm, it seems like this error is coming from the underlying embeddings service from Azure OpenAI. I've pinged the folks working on that to see if they can provide insights, as I haven't seen that before. I wonder if it still happens for you if you disable batch vector embeddings? You can add "--disablebatchvectors" to the prepdocs.py command at the bottom of prepdocs.ps1
Thank you. I am getting the same error after adding --disablebatchvectors
"$keyVaultName" +` "--disablebatchvectors"
It should be a slightly different traceback, but I assume you mean the final error is the same.
We're trying to figure out how we can replicate this or what's different about your environment. Does your machine have any firewalls or VPNs setup? Are you able to use GitHub Codespaces?
This is the complete error. I tried to run this without VPN also and i get the same error. Wanted to try this locally as we have some restrictions around using git hub codespaces. Could it be related to length of parameters? like if it have a long resource group name or so? just a guess
Extracting text from 'E:\test\01. AI\openai-demo-2024-01-27/data\Benefit_Options.pdf' using Azure Document Intelligence
Splitting 'Benefit_Options.pdf' into sections
Uploading blob for whole file -> Benefit_Options.pdf
Traceback (most recent call last):
File "E:\test\01. AI\openai-demo-2024-01-27\scripts\prepdocs.py", line 318, in
Can you add logging to prepdocs.py? That will show request headers from the openai request. Bottom of file should look like:
import logging
logging.basicConfig(level=logging.DEBUG)
loop = asyncio.get_event_loop()
file_strategy = loop.run_until_complete(setup_file_strategy(azd_credential, args))
loop.run_until_complete(main(file_strategy, azd_credential, args))
loop.close()
Can you also see if it depends on document? (Delete the first one and try with next)
I did try with just 1 document, it didnt help. I am sharing some details on the error that i got post enabling logging. DEBUG:httpcore.http11:receive_response_headers.complete return_value=(b'HTTP/1.1', 431, b'Request Header Fields Too Large', [(b'Content-Length', b'227'), (b 'Content-Type', b'text/plain'), (b'apim-request-id', b'I REPLACED THIS ID'), (b'x-ms-client-request-id', b'Not-Set'), (b'Strict-Transport- Security', b'max-age=31536000; includeSubDomains; preload'), (b'x-content-type-options', b'nosniff'), (b'x-ms-region', b'East US 2'), (b'x-ratelimit-remaini ng-requests', b'29'), (b'x-ratelimit-remaining-tokens', b'29995'), (b'Date', b'Fri, 02 Feb 2024 06:55:49 GMT')])
The post request DEBUG:openai._base_client:Request options: {'method': 'post', 'url': '/embeddings', 'headers': {'api-key':
is pretty big , the api-key itself is around 12K characters
Hm, my API key when using AzureDefaultCredential is 2095 characters, I wonder if that's the issue. Can you check if you're using credential
or AzureKeyCredential
on this line?
azure_open_ai_credential: Union[AsyncTokenCredential, AzureKeyCredential] = (
credential if is_key_empty(args.openaikey) else AzureKeyCredential(args.openaikey)
)
seems to be using AzureKeyCredential
Interesting, it should only use that if you explicitly set openaikey, which comes from the OPENAI_API_KEY environment variable. Can you check if OPENAI_API_KEY is set in your azd environment (azd env get-values) or your general environment?
Sorry, i believe its using credential only. This is what i get on printing print(azure_open_ai_credential)
<azure.identity.aio._credentials.azd_cli.AzureDeveloperCliCredential object at 0x0000029931419750>
Okay, that means the generated token from that token provider is coming out at 12K tokens. I'm not sure if that's the actual issue, am chatting with OpenAI SDK engineers.
Update: You actually shouldn't even have an api-key header, only an Authorization header. I wonder if its failing to get a token and falling back to a key. Do you happen to have AZURE_OPENAI_API_KEY environment variable in your environment? The openai SDK looks for that.
Can you try my changes in https://github.com/Azure-Samples/azure-search-openai-demo/pull/1228 ?
AZURE_OPENAI_API_KEY is not there. I copied over the embeddings.py into my environment. Script is stuck here for 5 mins.
Splitting 'Benefit_Options.pdf' into sections Uploading blob for whole file -> Benefit_Options.pdf
script is just stuck here, tried once more
Extracting text from 'E:\test\01. AI\openai-demo-2024-01-27/data\Benefit_Options.pdf' using Azure Document Intelligence Splitting 'Benefit_Options.pdf' into sections Uploading blob for whole file -> Benefit_Options.pdf
Encounter the same "Request Header Fields Too Large" when trying the the Azure OpenAI Chat playground with "Add your data"
Encounter the same "Request Header Fields Too Large" when trying the the Azure OpenAI Chat playground with "Add your data"
Same error when used the latest multimodal embedding process described here: https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-search-now-supports-ai-vision-multimodal-and-ai-studio/ba-p/4136743. I used GPT4o (East US Deployment)
This issue is for a: (mark with an
x
)Minimal steps to reproduce
Any log messages given by the failure
raise self._make_status_error_from_response(err.response) from None openai.APIStatusError: Request Header Fields Too Large
Expected/desired behavior
OS and Version?
azd version?
Mention any other details that might be useful