aws-samples / amazon-textract-transformer-pipeline

Post-process Amazon Textract results with Hugging Face transformer models for document understanding
MIT No Attribution
88 stars 25 forks source link

[Enhancement] Multi-page training annotation UI #27

Open athewsey opened 1 year ago

athewsey commented 1 year ago

As of now the custom online human review UI is able to render detection bounding boxes over a full multi-page document at once, but the training data annotation UI is based on the SageMaker Ground Truth bounding box tool which can only display one image/page at a time.

This is not ideal in cases where users would like to make model-training annotations on entire documents at a time: For example if context between different pages is important for labellers.

Ideally we would re-use and extend components from the review UI to make a custom annotation UI where we can similarly highlight entities, but process the entire PDF at once rather than a single page image.