Unstructured-IO / unstructured-api

Apache License 2.0
528 stars 110 forks source link

ValueError: Receive unexpected status code 504 from the API. #255

Closed Jimchoo91 closed 11 months ago

Jimchoo91 commented 1 year ago

As in the subject line, I have received this error a few times across the course of the last few hours.

Full code

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-370-f9d5eba144d9> in <module>
      9 )
     10 
---> 11 chev_docs_1 = loader.load()

/opt/anaconda3/lib/python3.8/site-packages/langchain/document_loaders/unstructured.py in load(self)
     84     def load(self) -> List[Document]:
     85         """Load file."""
---> 86         elements = self._get_elements()
     87         if self.mode == "elements":
     88             docs: List[Document] = list()

/opt/anaconda3/lib/python3.8/site-packages/langchain/document_loaders/unstructured.py in _get_elements(self)
    267 
    268     def _get_elements(self) -> List:
--> 269         return get_elements_from_api(
    270             file_path=self.file_path,
    271             api_key=self.api_key,

/opt/anaconda3/lib/python3.8/site-packages/langchain/document_loaders/unstructured.py in get_elements_from_api(file_path, file, api_url, api_key, **unstructured_kwargs)
    202         from unstructured.partition.api import partition_via_api
    203 
--> 204         return partition_via_api(
    205             filename=file_path,
    206             file=file,

/opt/anaconda3/lib/python3.8/site-packages/unstructured/partition/api.py in partition_via_api(filename, content_type, file, file_filename, api_url, api_key, **request_kwargs)
     88         files = [
     89             ("files", (metadata_filename, file, content_type)),  # type: ignore
---> 90         ]
     91         response = requests.post(
     92             api_url,

ValueError: Receive unexpected status code 504 from the API.
awalker4 commented 1 year ago

Sorry for the delay on this. The error happens with the pdf you posted here, correct? I just reproduced a 504 error with this and will do some digging.

awalker4 commented 1 year ago

I think this is due to pdfs taking a long time to process, and sometimes the server times out. As a workaround, you may want to try sending individual pages and combining the results. Here's a gist of splitting up a pdf and sending the pages to the api. In Langchain's case, I think you can write out the pages and pass individual filenames into the loader.

awalker4 commented 11 months ago

We just addressed some low memory issues in the hosted API which could lead to 502s. And the comment above should hopefully remove any 504 timeouts. Going to close this, but please let us know if the issue persists.