Unstructured-IO / unstructured-api-tools

Apache License 2.0
28 stars 10 forks source link

Refinement around handling of mixed text file / non-text file requests #106

Open cragwolfe opened 1 year ago

cragwolfe commented 1 year ago

Currently, a FastAPI route generated by unstructured-api-tools may process both files and text_files inputs if the notebook's pipeline_api's signature includes both file and text. E.g.:

def pipeline_api(text, file, file_content_type, filename):
    ...

If a client posts N text_files or N files to the API, there is no issue: the order of the outputs is dependent on the order the files are posted.

However, if text_files and files are posted in the same request, the order of multipart response is going to correspond to the order of all the outputs from each processed text file followed by the output of each processed non-text file. This isn't immediately obvious to the caller, so either this behaviour should be very well documented, or, the generated route should return an error with a friendly message.