aws-samples / amazon-textract-textractor

Analyze documents with Amazon Textract and generate output in multiple formats.
Apache License 2.0
408 stars 145 forks source link

Id in html output #386

Closed Belval closed 3 months ago

Belval commented 3 months ago

Issue #, if available: N/A

Description of changes: This change adds a toggle allowing users to include the layout element ids to the HTML output. This allows for further post-processing as needed.

To avoid impacting LLM token counts, we also offer truncated uuids (8 characters).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.