Open sentry-io[bot] opened 1 year ago
Hi, I am facing the same error. Please let me know if you resolved it.
Hi there! Do you have a file that reproduces the issue that you're able to share?
Same problem via unstructured-python-client: Failed to process a request due to API server error with status code 500. Attempting retry number 1 after sleep. unstructured-client: 36 - log_retries()] Server message - {"detail":"'utf-8' codec can't decode byte 0xff in position 0: invalid start byte"}
If I try to send some file with e.g. encoding UTF-16 and it will not work. The encoding parameter is set correctly and can be seen here unstructured-client/general.py req = client.prepare_request(requests_http.Request('POST', url, params=query_params, data=data, files=form, headers=headers))
I'm not sure if the issue is with the unstructured-python-client not encoding the form-post correctly or setting the accept header correctly, or if it's a problem with the server API.
Hi @andrePankraz , can you clarify how you're making the API call? The server does take a encoding
param (shown in the table here) that defaults to utf-8
. I suspect this file will work if you send encoding='utf-16'
.
Have you really tested it with an utf-16 file?
curl -X 'POST' \
'http://ai1.dev.init:8004/general/v0/general' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'files=@data/documents/CSV_UTF_16.csv' \
-F 'strategy=hi_res' \
-F 'languages=deu' \
-F 'encoding=utf-16'
{"detail":"'utf-8' codec can't decode byte 0xff in position 0: invalid start byte"}
Hi there! Do you have a file that reproduces the issue that you can share?
Hey @awalker4 , my file was corrupted while formatting it. There's no issue from the library.