awslabs / project-lakechain

:zap: Cloud-native, AI-powered, document processing pipelines on AWS.
https://awslabs.github.io/project-lakechain/
Apache License 2.0
115 stars 22 forks source link

Feature request: Support Textract as PDF parsing engine #41

Closed HQarroum closed 1 month ago

HQarroum commented 3 months ago

Use case

Using Textract as an engine within the PDF Text Processor.

Solution/User Experience

Allow users to specify textract as the engine to use in the PDF Text Processor, using a dependency injection.

Alternative solutions

No response