aws-samples / amazon-textract-transformer-pipeline

Post-process Amazon Textract results with Hugging Face transformer models for document understanding
MIT No Attribution
88 stars 25 forks source link

[Enhancement] Joint entity recognition and page/document classification #29

Open athewsey opened 1 year ago

athewsey commented 1 year ago

Today we demonstrate annotation and training for entity extraction only. For many use cases document classification is also important, and it should be pretty straightforward to support this too.

A sequence classification task is already supported in the open source (e.g. LayoutLMv2ForSequenceClassification), but joint cls+ner with a single model might be performant and more economical for users - and not require too much extra effort.