aws-samples / amazon-textract-textractor

Analyze documents with Amazon Textract and generate output in multiple formats.
Apache License 2.0
389 stars 142 forks source link

Caller: allow early return when job incomplete #326

Open symroe opened 7 months ago

symroe commented 7 months ago

I'm using Textract in a web application.

I'm enqueuing jobs using extractor.start_document_analysis and storing the returned job ID in a database.

Later, I call extractor.get_result(job_id) to get the response for processing.

At the moemnt, get_result calls t_call.get_full_json and that kicks off a while loop that blocks until the job is in some way finished (e.g job_status != "IN_PROGRESS").

I'd really like a blocking=False flag that will simply retun the job status if it's IN_PROGRESS. This is beacuse I'm already polling for completeness in my application and I don't want to block my polling workers in this loop.

Of course, I can use the boto3 API to check the status directly before calling get_result, but it would be nice to have this as a feature built in.

I'm happy to work on a PR if this seems like a useful feature.

Belval commented 7 months ago

This is something we could accept a PR for.

I think it could be implemented as extractor.get_status(job_id) which returns a value from an enum defined in https://github.com/aws-samples/amazon-textract-textractor/blob/master/textractor/data/constants.py with IN_PROGRESS, SUCCEEDED, FAILED, PARTIAL_SUCCESS.

My concern with blocking=False is that it changes the function return type to Union[Document, Enum] which seems a bit awkward from a handling perspective.