Unstructured-IO / unstructured-api

Apache License 2.0
429 stars 94 forks source link

fix/Fix python/js client sending unrecognized list params #412

Closed awalker4 closed 2 months ago

awalker4 commented 2 months ago

Changes

FormData fix

FastAPI expects list params to look a certain way in form data. The Speakeasy clients use a different, more explicit format that isn't getting parsed by the api. Therefore, when client users send skip_infer_table_types or extract_image_block_types, the server just sets them to None.

The fix is to transform the formdata params before FastAPI parses them into a list type. I tried adding middleware for this, but it turns out Request._form isn't loaded at this point. Instead, we can just monkeypatch the Request class to return the right thing when it's asked for.

I updated an old parallel mode unit test to more generally assert that all params received in the endpoint make it down to partition. By adding square brackets to the list params, we can see the test pass once the fix is applied.

Testing

Run the server at port 8000 with make run-web-app. Load a pyenv that has the latest python client, and in ipython, run the following snippet. In the server log, verify that we see a warning about extract_image_block_types having an invalid value: foo.

from unstructured_client import UnstructuredClient
from unstructured_client.models import shared
from unstructured_client.models.errors import SDKError

s = UnstructuredClient(
    api_key_auth=None,
    server_url="http://localhost:8000",
)

filename = "/path/to/any/file"

with open(filename, "rb") as f:
    files=shared.Files(
        content=f.read(),
        file_name=filename,
)

req = shared.PartitionParameters(
    files=files,
    # Other partition params
    strategy='fast',
    extract_image_block_types=["Foo", "Foo"],
)

try:
    resp = s.general.partition(req)
    print(resp.elements[0])
except SDKError as e:
    print(e)