aws-samples / amazon-textract-transformer-pipeline

Post-process Amazon Textract results with Hugging Face transformer models for document understanding
MIT No Attribution
88 stars 25 forks source link

[Enhancement] Refactor image splitting to a SM Processing Job #1

Closed athewsey closed 2 years ago

athewsey commented 2 years ago

The initial image cleaning/splitting process in notebook 1 takes a long time to complete, and is a good potential use case for a SageMaker Processing Job to scale out the resources. This would be especially useful for any users hoping to process the full corpus (or big corpora of their own).