aws-samples / amazon-textract-textractor

Analyze documents with Amazon Textract and generate output in multiple formats.
Apache License 2.0
360 stars 134 forks source link

Set JPEG compression parameters #342

Closed Belval closed 3 months ago

Belval commented 3 months ago

Issue #, if available: #341

Description of changes: Set JPEG compression quality to 95 and subsampling to 0 (4:4:4) as the default apply too much compression which degrades Textract performance compared to calling boto3 directly.

Originally the idea was to make it lossless: https://github.com/aws-samples/amazon-textract-textractor/tree/use-png-in-file-manipulations however there is a significant latency hit from doing so (~200ms) so it will be done in a future update as an option instead of a widespread change.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.