aws-samples / amazon-textract-transformer-pipeline

Post-process Amazon Textract results with Hugging Face transformer models for document understanding
MIT No Attribution
88 stars 25 forks source link

[Enhancement] SageMaker async inference #8

Closed athewsey closed 2 years ago

athewsey commented 2 years ago

To host the model, this sample currently deploys a real-time SageMaker endpoint backed by GPU - which may be fine for high-volume use cases but probably pretty resource-intensive for many.

Since the end-to-end workflow here is asynchronous anyway (may have a human review component), it's probably a good case for new SageMaker asynchronous inference feature which supports scaling down to zero when demand is low.

TBD: Do we need to retain a real-time deployment option for anybody that wants to optimize for low latency? Seems unnecessary to me at the moment