Unstructured-IO / unstructured-python-client

A Python client for the Unstructured hosted API
MIT License
82 stars 16 forks source link

chore: Bump the default split page concurrency #122

Closed awalker4 closed 4 months ago

awalker4 commented 4 months ago

Verified that this shows a speedup by doing a local pip install and running the following snippet before and after the change:

from unstructured_client import UnstructuredClient
from unstructured_client.models import shared

s = UnstructuredClient(
    server_url=SERVER_URL,
    api_key_auth=API_KEY,
    )

filename = "../_sample_docs/layout-parser-paper.pdf"

with open(filename, "rb") as f:
    # Note that this currently only supports a single file
    files=shared.Files(
        content=f.read(),
        file_name=filename,
    )

req = shared.PartitionParameters(
    files=files,
    strategy="hi_res",
)

start_time = time.time()
resp = s.general.partition(req)
end_time = time.time()
print(f"Elapsed time: {end_time - start_time} seconds")
awalker4 commented 4 months ago

Ah, the tests should pass once this is merged. The integration tests are pulling the latest unstructured-api image and testing requests against it, so we got blocked by the recent docx "None is not a valid mimetype` bug.