Open Mshz2 opened 9 months ago
HI @Mshz2 ,
Thanks for reporting this issue. Are you using the sample data or your own custom data set? Are you using open ai embeddings, or azure open ai embeddings?
@mattgotteiner Hi, thanks for the reply. I am using my own PDFs. Some of them can have above 100 pages. I'm using azure openai.
thanks - it's possible a single document is causing this error. we'll have to file a follow-up issue to skip documents that have this error and then you can retry
@mattgotteiner I also having the same issue. some documents are fine.. wondering what kind document will cause this issue raise self._make_status_error_from_response(err.response) from None openai.BadRequestError: Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
I have the same problem. This may depend on the switch to the new client lib azure-ai-documentintelligence==1.0.0b1. With old releases or local parser it works.
HAve the same problem. Did someone find the solution? Can't use some of my pdf files. Changing the version of Azure did not worK. Could it be if there are blank spaces at the beginning of the pdf? Appreciate help ! Thanks a lot
I do have the same issue and it only happens for selective files. I am not sure if the images or tables in the file are causing the issue but there is no specific pattern between the files. Was anyone able to resolve the issue and any tips would be helpful!
Here is the update on this issue for everyone. The below solution from pamelafox works,
Update: This error is happening when we pass a text of length 0 (an empty string) to the batch embeddings API. The single embedding API is fine with that input, but the batch embedding API is not. (See https://github.com/openai/openai-python/issues/576)
Now, I don't know yet why we have sections that have 0 text in them, as I don't expect that to happen in most cases (possibly for GPT4-vision, but this occurs with vision disabled as well). I'm going to ask @tonybaloney to see if it was related to the recent splitting change.
As another workaround, you can put this code in create_embedding_batch:
batch.texts = [text if text else " " for text in batch.texts] emb_response = await client.embeddings.create(model=self.open_ai_model_name, input=batch.texts) The batch embedding endpoint seems fine with a whitespace string.
https://github.com/Azure-Samples/azure-search-openai-demo/issues/1415
There was a proper fix in the code base about 2 weeks ago, please pull/download the latest release
This issue is for a:
Minimal steps to reproduce
OS and Version?
azd version?