aws-samples / amazon-textract-transformer-pipeline

Post-process Amazon Textract results with Hugging Face transformer models for document understanding
MIT No Attribution
88 stars 25 forks source link

Plain-text seq2seq models #26

Closed athewsey closed 1 year ago

athewsey commented 1 year ago

Issue #, if available: N/A

Description of changes:

This PR introduces trainable sequence-to-sequence models for generative tasks like OCR error correction or field re-formatting (e.g. date normalization).

Some important caveats:

Testing done:

Re-verified notebooks on an existing environment (didn't re-build from scratch)


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.