Azure-Samples / chat-with-your-data-solution-accelerator

A Solution Accelerator for the RAG pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences. This includes most common requirements and best practices.
https://azure.microsoft.com/products/search
MIT License
789 stars 395 forks source link

hebrew pdf documents and web urls gives latin-1 error #1333

Open freshuk opened 1 week ago

freshuk commented 1 week ago

Describe the bug

when uploading pdf documents and web urls in the ingest documents screen, i am getting an error which most of the times looks like this: Traceback (most recent call last): File "/usr/local/src/myscripts/admin/pages/01_Ingest_Data.py", line 95, in <module> st.session_state["file_url"] = blob_client.upload_file( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/src/myscripts/admin/batch/utilities/helpers/azure_blob_storage_client.py", line 119, in upload_file blob_client.upload_blob( File "/usr/local/lib/python3.11/site-packages/azure/core/tracing/decorator.py", line 105, in wrapper_use_tracer return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_blob_client.py", line 775, in upload_blob return upload_block_blob(**options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_upload_helpers.py", line 102, in upload_block_blob response = client.upload( ^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/tracing/decorator.py", line 105, in wrapper_use_tracer return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_generated/operations/_block_blob_operations.py", line 846, in upload pipeline_response: PipelineResponse = self._client._pipeline.run( # pylint: disable=protected-access ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 229, in run return first_node.send(pipeline_request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ [Previous line repeated 2 more times] File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/policies/_redirect.py", line 197, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_shared/policies.py", line 529, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ [Previous line repeated 1 more time] File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_shared/policies.py", line 302, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 86, in send response = self.next.send(request) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/_base.py", line 118, in send self._sender.send(request.http_request, **request.context.options), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/storage/blob/_shared/base_client.py", line 348, in send return self._transport.send(request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/pipeline/transport/_requests_basic.py", line 355, in send response = self.session.request( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/opentelemetry/instrumentation/requests/init.py", line 180, in instrumented_send return wrapped_send(self, request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 667, in send resp = conn.urlopen( ^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/opentelemetry/instrumentation/urllib3/init.py", line 316, in instrumented_urlopen return wrapped(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 789, in urlopen response = self._make_request( ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 495, in _make_request conn.request( File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 397, in request self.putheader(header, value) File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 311, in putheader super().putheader(header, *values) File "/usr/local/lib/python3.11/http/client.py", line 1267, in putheader values[i] = one_value.encode('latin-1') ^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-4: ordinal not in range(256)

Steps to reproduce

Steps to reproduce the behavior:

  1. Go to 'admin url'
  2. Click on 'ingest documents'
  3. Upload 'any hebrew pdf document or input any hebrew url'
  4. See error

Screenshots

https://snipboard.io/w9y8gB.jpg

Govardhana-Microsoft commented 1 week ago

@freshuk We are able to reproduce the issue. seems issue with the file name. our team looking into this.

image

@Roopan-Microsoft

freshuk commented 1 week ago

exactly! i just found out that if i change the file name to english, it uploads fine, but with the urls i can't change the name obviously thank you

‫בתאריך יום ב׳, 23 בספט׳ 2024 ב-10:39 מאת ‪Govardhana-Microsoft‬‏ <‪ @.***‬‏>:‬

@freshuk https://github.com/freshuk We are able to reproduce the issue. seems issue with the file name. our team looking into this. image.png (view on web) https://github.com/user-attachments/assets/ac326267-e115-4fae-9d68-6c94dd354aa3 @Roopan-Microsoft https://github.com/Roopan-Microsoft

— Reply to this email directly, view it on GitHub https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator/issues/1333#issuecomment-2367442157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUQD7X24WJB6HNMEQSRWFXLZX7AUTAVCNFSM6AAAAABOU3TKAOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRXGQ2DEMJVG4 . You are receiving this because you were mentioned.Message ID: <Azure-Samples/chat-with-your-data-solution-accelerator/issues/1333/2367442157 @github.com>

-- בני ווקס - סמנכ"ל תפעול וטכנולוגיה

ת.א. יבולה בע"מ נייד: 0544-220421

Prasanjeet-Microsoft commented 3 days ago

@freshuk We are currently addressing this issue and will keep you updated.

Prasanjeet-Microsoft commented 3 days ago

@freshuk Can you please provide us the URL's for which you are getting errors while uploading in ingest documents screen?