While this sample was originally created for multi-page documents in PDF, other related use-cases (such as ID document or receipt extraction) may operate on single-page images/photographs/scans instead.
Today there's support for images in some aspects of the pipeline, but others assume PDF. It would be great to round out support for images as source documents - particularly for common JPEG+PNG formats which have good native support in e.g. Amazon Textract, SageMaker Ground Truth, and web browsers.
[X] 1. (Believe so but need to double-check) Core Textract state machine component supports OCRing image files
While this sample was originally created for multi-page documents in PDF, other related use-cases (such as ID document or receipt extraction) may operate on single-page images/photographs/scans instead.
Today there's support for images in some aspects of the pipeline, but others assume PDF. It would be great to round out support for images as source documents - particularly for common JPEG+PNG formats which have good native support in e.g. Amazon Textract, SageMaker Ground Truth, and web browsers.