deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.99k stars 1.86k forks source link

Add support for AWS textract #4184

Open MarkDirksen opened 1 year ago

MarkDirksen commented 1 year ago

Hi,

I was wondering if there is any interest in adding support for AWS textract for extracting text / tables ? I noticed there is already an option for a similar offering from Azure (AzureConverter). I mainly use AWS and it would be convenient to use Textract instead.

If there is interest in doing so I might be able to spend some time on it in the near future. In that case I'll have some questions on the details.

Thanks!

masci commented 1 year ago

Hi @MarkDirksen thanks for the feature request and the offer to help!

To make sure we're all on the same page before investing time into the actual implementation, we would ask you to file a design proposal detailing the changes you would like to see in Haystack. Once approved, you could implement the proposal yourself, wait for a contributor to pick it up, or ask the core developers to prioritise it.

arminnajafi commented 3 months ago

Hello,

This is Armin from AWS. I will take a stab at drafting a proposal and implementing this issue.

Thanks,